JP4533126B2 - Proximity sound separation / collection method, proximity sound separation / collection device, proximity sound separation / collection program, recording medium - Google Patents
Proximity sound separation / collection method, proximity sound separation / collection device, proximity sound separation / collection program, recording medium Download PDFInfo
- Publication number
- JP4533126B2 JP4533126B2 JP2004373810A JP2004373810A JP4533126B2 JP 4533126 B2 JP4533126 B2 JP 4533126B2 JP 2004373810 A JP2004373810 A JP 2004373810A JP 2004373810 A JP2004373810 A JP 2004373810A JP 4533126 B2 JP4533126 B2 JP 4533126B2
- Authority
- JP
- Japan
- Prior art keywords
- band
- signal
- sound
- value
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Description
本発明は、マイクロホンに近接した目的音源と、マイクロホンから離れた雑音源が同時に鳴っている環境において、雑音信号を抑圧し、目的音を高いSN比で収音する近接音分離収音方法、近接音分離収音装置に関する。 The present invention relates to a proximity sound separation and collection method that suppresses a noise signal and collects a target sound with a high S / N ratio in an environment in which a target sound source close to the microphone and a noise source far from the microphone are simultaneously sounding. The present invention relates to a sound separating and collecting apparatus.
目的音と雑音が同時に鳴っている環境において、雑音を抑圧し、目的音を強調する方法としては、従来、単一のマイクロホンを用いて目的音を音声とし、雑音として空調ノイズなど時間変動が緩やかな雑音(以下、定常雑音)を想定し、雑音の定常性を利用して混合信号のスペクトルから雑音信号のスペクトルを減算するスペクトルサブトラクション法(非特許文献1)が提案されている。
また、複数のマイクロホンを用いて雑音を抑圧するマイクロホンアレー法(非特許文献2)も提案されている。
A microphone array method (Non-Patent Document 2) that suppresses noise using a plurality of microphones has also been proposed.
非特許文献1で提案されている雑音抑圧方法は雑音の定常性を用いるため、音声や、音楽など非定常な雑音信号を抑圧することは困難とする第1の課題が生じる。また、非特許文献2で提案されている雑音抑圧方法は少なくとも2本以上のマイクロホンを必要とするため、装置の規模が大きくなるとする第2の課題が生じる。
Since the noise suppression method proposed in Non-Patent Document 1 uses noise continuity, a first problem that makes it difficult to suppress non-stationary noise signals such as speech and music occurs. Further, since the noise suppression method proposed in Non-Patent
本発明の第1実施形態によれば音声入力手段の各出力信号を音声帯域内において複数の帯域信号に分割する帯域分割手段と、帯域信号の音響特徴量を算出する帯域別特徴量算出手段と、帯域別特徴量算出手段で算出された各帯域別の特徴量に基づき、目的音源の信号を主成分とする信号であるか、又は雑音を主成分とする信号であるかを判定する帯域別信号判定手段と、帯域別信号判定手段で判定した判定結果に基づいて、前記帯域別に重み値を決定する帯域別重み値決定手段と、帯域別重み値決定手段で決定された重み値を前記各帯域信号に乗算する帯域別重み値乗算手段と、帯域別重み値乗算手段で重み付けされた信号を時間波形に戻す信号合成手段とを備えることを特徴とする。 According to the first embodiment of the present invention, a band dividing unit that divides each output signal of the voice input unit into a plurality of band signals within a voice band, and a band-specific feature amount calculation unit that calculates an acoustic feature amount of the band signal. Based on the feature value for each band calculated by the feature value calculation unit for each band, it is determined for each band whether the signal is a signal mainly composed of the signal of the target sound source or a signal mainly composed of noise. Based on the determination result determined by the signal determination unit, the signal determination unit for each band, the weight value determination unit for each band for determining the weight value for each band, and the weight value determined by the weight value determination unit for each band It is characterized by comprising band-by-band weight value multiplying means for multiplying the band signal and signal synthesis means for returning the signal weighted by the band-by-band weight value multiplying means to a time waveform.
本発明の第2の実施形態によれば第1の実施形態で提案した近接音分離収音装置において、帯域別特徴量算出手段は各帯域信号のパワー値を算出し、帯域別信号判定手段は各帯域信号のパワー値が予め設定した閾値以上を目的音信号を主成分とする帯域信号として判定し、閾値以下を雑音を主成分とする帯域信号と判定することを特徴とする。
本発明の第3の実施形態によれば第1の実施形態で提案した近接音分離収音装置において、帯域別特徴量算出手段は各帯域信号の特徴量として尖鋭度を算出し、帯域信号判定手段は各帯域信号の尖鋭度が予め設定した閾値以上を目的音信号を主成分とする帯域信号と判定し、各帯域信号の尖鋭度が閾値以下を雑音を主成分とする帯域信号と判定することを特徴とする。
According to the second embodiment of the present invention, in the proximity sound separating and collecting apparatus proposed in the first embodiment, the band-specific feature amount calculating means calculates the power value of each band signal, and the band-specific signal determining means is The power value of each band signal is determined to be a band signal whose main component is a target sound signal when the power value is higher than a preset threshold value, and is determined to be a band signal whose main component is noise.
According to the third embodiment of the present invention, in the proximity sound separating and collecting apparatus proposed in the first embodiment, the band-specific feature value calculating unit calculates the sharpness as the feature value of each band signal, and determines the band signal. The means determines that the sharpness of each band signal is equal to or higher than a preset threshold value as a band signal whose main component is the target sound signal, and determines that the sharpness of each band signal is equal to or lower than the threshold value as a band signal whose main component is noise. It is characterized by that.
本発明の第4の実施形態によれば前述した第2の実施形態又は第3の実施形態で提案した近接音分離収音装置の何れかにおいて、帯域別特徴量算出手段で算出した特徴量の値から閾値を算出する閾値算出手段を付加し、この閾値算出手段で算出した閾値に従って帯域別信号判定手段の判定を実行することを特徴とする。
各実施形態において、目的音は音声とし、目的音は雑音源に比べてマイクロホンに近接している、という条件に限定して用いる。また、音声信号のスパース性(パワーの大きな周波数が、特定の帯域に局在する性質)に着目して、雑音が混ざった信号から目的音声を推定する。
According to the fourth embodiment of the present invention, in any of the proximity sound separation and collection apparatuses proposed in the second embodiment or the third embodiment described above, the feature amount calculated by the band-wise feature amount calculation means is used. A threshold value calculation means for calculating a threshold value from the value is added, and the determination by the band-specific signal determination means is executed according to the threshold value calculated by the threshold value calculation means.
In each embodiment, the target sound is voice, and the target sound is used only under the condition that the target sound is closer to the microphone than the noise source. In addition, paying attention to the sparseness of the voice signal (the property that a frequency with a large power is localized in a specific band), the target voice is estimated from a signal mixed with noise.
帯域分割手段においては、各帯域の信号が主として1つの音響信号成分よりなる程度(目的音のスペクトルを分離できる程度)に細かく帯域分割する、具体例としては20Hz程度、また、目的音が雑音に比べてマイクロホンに近接していることから、目的音信号は雑音信号よりも大きくマイクロホンに受音される、と仮定する。
受音信号を帯域分割手段で帯域分割し、帯域別特徴量算出手段で帯域毎の音響的特徴量を算出する。帯域別重み値決定手段では、帯域別に算出した特徴量に基づき、各周波数成分が、マイクロホンに近接した目的音源の成分であるか、遠方から到来する雑音源の成分であるかを判定し、その判定に基づき重み値α(ω1)を決定する。例えば、特徴量として各帯域のパワーを用いた場合、目的音源の信号パワーは雑音の信号パワーよりも大きいことを利用して、パワーがあらかじめ定めた閾値より大きくなる帯域の信号は、目的信号と判定し、その帯域に乗算する重み値を例えばα(ω1)=1.0と決定する。パワーが閾値より小さくなる帯域は雑音信号の成分と判定し、ゼロに近い重み値α(ω1)(0<α(ω1)<1)と決定する。
In the band dividing means, the band is finely divided to the extent that the signal of each band is mainly composed of one acoustic signal component (to the extent that the spectrum of the target sound can be separated). As a specific example, the target sound is converted into noise. It is assumed that the target sound signal is received by the microphone larger than the noise signal because it is closer to the microphone.
The received sound signal is band-divided by the band dividing means, and the acoustic feature quantity for each band is calculated by the band-specific feature quantity calculating means. The weight value determining means for each band determines whether each frequency component is a component of a target sound source close to the microphone or a noise source component coming from a distance based on the feature amount calculated for each band. A weight value α (ω 1 ) is determined based on the determination. For example, when the power of each band is used as the feature amount, the signal of the band in which the power is larger than a predetermined threshold is used as the target signal by using the fact that the signal power of the target sound source is larger than the signal power of noise. Determination is made, and a weight value to be multiplied by the band is determined to be, for example, α (ω 1 ) = 1.0. The band where the power is smaller than the threshold is determined as a noise signal component, and the weight value α (ω 1 ) (0 <α (ω 1 ) <1) close to zero is determined.
また、特徴量として信号の尖鋭度(実施例において詳しく定義を説明する)を用いる場合には、近接した音源の尖鋭度は大きく、遠方音源の尖鋭度は小さくなる性質を利用して、尖鋭度がある閾値以下の場合には雑音信号成分と判定してゼロに近い重み値を例えばα(ω1)(0<α(ω1)<1)と決定する。
帯域別重み値乗算手段においては、決定した重み値α(ω1)を各帯域信号X(ω1)に乗算する。このように重み付けされた信号を信号合成手段により時間波形に戻す。
In addition, when using the sharpness of a signal (detailed description will be described in the embodiment) as a feature amount, the sharpness of a nearby sound source is large and the sharpness of a distant sound source is small. If it is less than a certain threshold value, it is determined as a noise signal component, and a weight value close to zero is determined as, for example, α (ω 1 ) (0 <α (ω 1 ) <1).
The band-by-band weight value multiplication means multiplies each band signal X (ω 1 ) by the determined weight value α (ω 1 ). The weighted signal is returned to the time waveform by the signal synthesis means.
本発明の構成によれば雑音の性質(定常性)を用いることなく雑音が混じった信号から目的音声を回復することができる。よって、雑音源が音声や音楽など非定常な信号に対しても対応が可能である。つまり、上述した第1の課題を解決することができる。
また、本発明では単一のマイクロホンで実現可能なため、装置規模も小さくできる。これにより上述した第2の課題も解決することができる。
According to the configuration of the present invention, it is possible to recover the target speech from a signal mixed with noise without using the nature (stationarity) of noise. Therefore, it is possible to cope with a non-stationary signal such as voice or music as a noise source. That is, the first problem described above can be solved.
In addition, since the present invention can be realized with a single microphone, the apparatus scale can be reduced. Thereby, the second problem described above can also be solved.
本発明による近接音分離収音装置は全てをハードウェアにより構成することができるが、それより、コンピュータが解読可能なプログラム言語によって記述された近接音分離収音プログラムをコンピュータにインストールし、コンピュータに近接音分離収音装置として機能させる実施形態が最良の実施形態である。
コンピュータに本発明による近接音分離収音装置として機能させる場合、コンピュータには帯域分割手段、帯域別特徴量算出手段、帯域別信号判定手段、帯域別重み付け値決定手段、帯域別重み値乗算手段、信号合成手段を構築し、近接分離装置として機能させる。
The proximity sound separation and collection apparatus according to the present invention can be configured entirely by hardware. From this, a proximity sound separation and collection program written in a computer-readable program language is installed in the computer, and the computer is installed. The embodiment that functions as the proximity sound separation and collection device is the best embodiment.
When the computer is caused to function as a proximity sound separating and collecting apparatus according to the present invention, the computer includes a band dividing unit, a band-specific feature amount calculating unit, a band-specific signal determining unit, a band-specific weight value determining unit, a band-specific weight value multiplying unit, A signal synthesis means is constructed and functions as a proximity separation device.
図1に本発明の請求項5で提案する近接音分離収音装置の実施例を示す。入力手段1は例えばマイクロホンとする。目的音源Mの信号をS(t)、雑音源Nの信号をn(t)とする。説明を簡略化するために、ここでは雑音源Nを一つとして説明するが、一般に雑音源Nは複数でも良い。
帯域分割手段2においては例えば高速フーリエ変換などで音声帯域内を複数の帯域に分割する。このとき、各帯域信号X(ω1),X(ω2),...X(ωN)は、主として一つの音響信号成分よりなる程度に細かく分割する。ここで一つの音響信号成分とは信号S(t)及びn(t)に含まれる一つのスペクトルを指し、各スペクトルを分離できる程度の細かさに分割すれば良いとされている。(更に詳しくは特許第3355598号明細書を参照)。
FIG. 1 shows an embodiment of a proximity sound separating and collecting apparatus proposed in
The band dividing means 2 divides the voice band into a plurality of bands by, for example, fast Fourier transform. At this time, each band signal X (ω 1 ), X (ω 2 ),. . . X (ω N ) is subdivided so as to be mainly composed of one acoustic signal component. Here, one acoustic signal component refers to one spectrum included in the signals S (t) and n (t), and it is only necessary to divide each spectrum into fine parts that can be separated. (For further details, see Japanese Patent No. 3355598).
帯域別特徴量算出手段3においては、各周波数帯域毎に信号の音響的特徴量(τ(ω1))を算出する。この特徴量とは例えば、信号のパワーや尖鋭度である。ここでは本発明の請求項6で提案する信号のパワーを特徴量として用いるものとして説明する。従って帯域別特徴量算出手段3は各帯域信号X(ω1),X(ω2),…X(ωN)のパワー値20log10|X(ω1)|,20log10|X(ω2)|,…20log10|X(ωN)|,を出力する。
帯域別信号判定手段4は各帯域のパワー値により、各帯域信号X(ω1),X(ω2),…X(ωN)の属性を判定する。ここで雑音は目的音より遠方から到来するため、雑音信号n(t)は目的音信号S(t)に比べて小さく受音される、と仮定できる。すなわち、帯域分割した帯域信号X(ω1),X(ω2),…X(ωN)は図2に示すようなスペクトルを持つと考えられる。よって図2に示したようにパワーが閾値(T)を超える帯域はその主成分が目的信号S(t)であると推定され、閾値T以下の帯域はその主成分が雑音信号n(t)であると推定される。帯域別信号判定手段4はこの判定アルゴリズムを適用して各帯域信号X(ω1),X(ω2),…X(ωN)の属性を判定し、その判定結果を帯域別重み値決定手段5に受け渡す。
The band-specific feature value calculation means 3 calculates the acoustic feature value (τ (ω 1 )) of the signal for each frequency band. This feature amount is, for example, signal power or sharpness. Here, description will be made assuming that the power of the signal proposed in claim 6 of the present invention is used as the feature amount. Therefore, the band-specific feature amount calculation means 3 uses the power values 20log 10 | X (ω 1 ) |, 20 log 10 | X (ω 2 ) of the band signals X (ω 1 ), X (ω 2 ),... X (ω N ). ) |,... 20 log 10 | X (ω N ) |
The band-specific signal determination means 4 determines the attribute of each band signal X (ω 1 ), X (ω 2 ),... X (ω N ) based on the power value of each band. Here, since noise comes from a distance from the target sound, it can be assumed that the noise signal n (t) is received smaller than the target sound signal S (t). That is, the band-divided band signals X (ω 1 ), X (ω 2 ),... X (ω N ) are considered to have a spectrum as shown in FIG. Therefore, as shown in FIG. 2, it is estimated that the main component of the band where the power exceeds the threshold (T) is the target signal S (t), and the main component of the band below the threshold T is the noise signal n (t). It is estimated that. The band-specific signal determination means 4 applies this determination algorithm to determine the attributes of the respective band signals X (ω 1 ), X (ω 2 ),... X (ω N ), and determines the determination result as a weight value for each band. Deliver to
帯域別重み値決定手段5では目的音信号S(t)と判定された帯域には重み値α(ωi)を例えばα(ωi)=1.0と決定する。また、雑音信号n(t)と判定された帯域には重み値α(ωi)を例えば0≦α(ωi)≦1と決定する。雑音と判定された帯域に指定した重み値0≦α(ωi)≦1は限りなく0に近い値とされる。目的音信号と判定された帯域に指定した重み値α(ωi)=1は必ずしも1でなくともよく、雑音帯域に与えた重み値より大きい値であればよい。
帯域別重み値決定手段5で決定した各帯域の重み値α(ωi),α(ω2),…α(ωN)は帯域別重み値乗算手段6に与えられ、この帯域別重み値乗算手段6で各帯域信号X(ω1),X(ω2),…X(ωN)に乗算され、重み付けされた各帯域信号α(ωi)・X(ω1),α(ω2)・X(ω2),…α(ωN)・X(ωN)を信号合成手段7に入力し、信号合成手段7で例えば逆フーリエ変換等を用いて時間信号に戻される。雑音と判定した帯域には限りなく0に近い重み値を指定したから、この時間信号に含まれる雑音信号成分はわずかとなり、目的音信号S(t)のSN比が向上する。
The band-specific weight
The weight values α (ω i ), α (ω 2 ),... Α (ω N ) of each band determined by the band-specific weight
図3はこの発明の請求項7で提案した近接音分離収音装置の実施例を示す。この実施例では特徴量算出手段3において算出する特徴量を尖鋭度J(ω1),J(ω2),…J(ωN)とした場合を示す。
信号x(n)の線形予想残差信号をy(n)とする。信号y(n)の尖鋭度(n)は下記(1)で定義される、Eはカッコ内の平均値
J(n)=E{y4(n)}/E2{y2(n)}-3 …(1)
信号y(n)の尖鋭度は、マイクロホンに近接した音源信号の場合の値が大きく、マイクロホンから遠方になるにつれて値が小さくなることが知られている。この性質を帯域分割した帯域信号X(ω1)に適用することを考える。帯域分割された帯域信号X(ω1)の尖鋭度を測定し、各帯域の尖鋭度が予め定めた閾値Tを越える場合には目的音信号と判定し、閾値以下となる帯域は雑音信号成分と判定する。ここで時間波形x(n)の場合には、一旦信号を線形予測し、その残差信号y(n)を求め、その残差信号y(n)について尖鋭度を測定する必要があった。これは線形予測により音声の包絡情報を除去するためであった。しかし、帯域分割した各成分にはすでに音声の包絡線情報が残っていないため、本発明では帯域分割した信号X(ω1)の尖鋭度J~(ωi,J)を式(2)に定義し、それを用いて各帯域の信号成分の属性を判定する。
FIG. 3 shows an embodiment of the proximity sound separating and collecting apparatus proposed in
Let y (n) be the linear prediction residual signal of the signal x (n). The sharpness (n) of the signal y (n) is defined by (1) below, E is the average value in parentheses
J (n) = E {y 4 (n)} / E 2 {y 2 (n)}-3 (1)
It is known that the sharpness of the signal y (n) has a large value in the case of a sound source signal close to the microphone and decreases as the distance from the microphone increases. Consider applying this property to a band signal X (ω 1 ) obtained by band division. The sharpness of the band-divided band signal X (ω 1 ) is measured, and when the sharpness of each band exceeds a predetermined threshold T, it is determined as a target sound signal, and the band below the threshold is a noise signal component Is determined. Here, in the case of the time waveform x (n), it is necessary to linearly predict the signal once, obtain the residual signal y (n), and measure the sharpness of the residual signal y (n). This is to remove the envelope information of speech by linear prediction. However, since the envelope information of the voice has not already been left in each band-divided component, in the present invention, the sharpness J ~ (ω i , J) of the band-divided signal X (ω 1 ) is expressed by Equation (2). Define and use it to determine the attributes of the signal components in each band.
J~(ωi,j)= E{x4(ωi,j)}/E2{x2(ωi,j)}-3 …(2)
ここで、インデックスiは帯域のインデックス、jはフレームのインデックスである。
帯域別特徴量算出手段3は、式(2)で定義した尖鋭度J~(ωi, j)を各帯域について算出する。帯域別信号判定手段4は尖鋭度がある閾値以上の帯域は目的音信号成分と判定し、尖鋭度がある閾値以下の帯域は雑音信号と判定する。
帯域別重み値決定手段5は図1の場合と同様に、目的音信号成分と判定した帯域に対しては重み値α(ωi)をα(ωi)=1.0と決定し、雑音信号成分と判定した帯域に対しては重み値α(ωi)をゼロに近い値をα(ωi)(0≦α(ωi)≦1)として決定する。決定した各帯域の重み値α(ωi)を各帯域信号X(ωi)に乗算し、重み付けされた各帯域信号α(ωi)・X(ωi)を信号合成手段7で時間信号に戻すことにより雑音成分が除去された目的音信号を得ることができる。
J ~ (ω i , j ) = E {x 4 (ω i , j )} / E 2 {x 2 (ω i , j )}-3 (2)
Here, the index i is a band index, and j is a frame index.
The band-specific feature amount calculation means 3 calculates the sharpness J ~ (ω i , j ) defined by the equation (2) for each band. The band-specific signal determination unit 4 determines that a band having a sharpness equal to or higher than a threshold is a target sound signal component, and determines a band having a sharpness not higher than the threshold to be a noise signal.
Similarly to the case of FIG. 1, the band-specific weight
ところで、上述した実施例では帯域別信号判定手段4の判定を予め定めた閾値Tを用いて各帯域の信号の属性を判定したが、この判定方法を採る場合は、目的音信号に対して雑音信号が充分小さい場合には有効であるが、雑音信号が大きくなるに伴って、閾値Tを大きく設定する必要が生じる。一方、音声信号は一般に高域になるにつれて信号のパワーが小さくなる性質を持つ(図5参照)。そのため雑音が大きくなると、雑音の低域成分の影響を抑制するために閾値Tを大きく設定する必要が生じ、その結果、目的音信号の高域成分まで抑圧してしまうという問題が生じる(図5参照)。 By the way, in the above-described embodiment, the attribute of each band signal is determined using a predetermined threshold T for the determination by the band-specific signal determination means 4, but when this determination method is employed, noise is detected with respect to the target sound signal. This is effective when the signal is sufficiently small, but as the noise signal increases, the threshold T needs to be set larger. On the other hand, the sound signal generally has the property that the power of the signal decreases as the frequency becomes higher (see FIG. 5). Therefore, when the noise increases, it is necessary to set a large threshold value T in order to suppress the influence of the low frequency component of the noise, and as a result, there arises a problem that the high frequency component of the target sound signal is suppressed (FIG. 5). reference).
図4はこの問題を解決するための実施例(請求項8に対応)を示す。この実施例では複数の帯域毎に適正な閾値を算出する閾値算出手段8を設け、この閾値算出手段8で算出した閾値を用いて、帯域別信号判定手段4で適正に信号の属性を判定しようとするものである。
つまり、この実施例では音声信号はいくつかの(通常、3つ程度)のフォルマント周波数を有するという特徴(図5参照)と、更に、高域になるにつれてパワーが減衰するという特徴を利用して受音信号s(t)+n(t)を複数個、例えば3つ程度のバンドに分離し、閾値算出手段8で各バンド毎に適した閾値を算出することで、雑音がある程度大きい場合にも本発明を適用可能としたものである。但し、雑音信号のパワーは目的音信号のパワーより小さい、とする条件は必要である。
FIG. 4 shows an embodiment (corresponding to claim 8) for solving this problem. In this embodiment, a threshold value calculation means 8 for calculating an appropriate threshold value for each of a plurality of bands is provided, and using the threshold value calculated by the threshold value calculation means 8, the signal attribute determination means 4 for each band appropriately determines signal attributes. It is what.
That is, in this embodiment, the sound signal has several (usually about three) formant frequencies (see FIG. 5), and further, the power attenuates as the frequency increases. Even when the noise is large to some extent, the received signal s (t) + n (t) is separated into a plurality of, for example, about three bands, and the threshold calculation unit 8 calculates a threshold suitable for each band. The present invention can be applied. However, the condition that the power of the noise signal is smaller than the power of the target sound signal is necessary.
以下に具体的な方法を説明する。図5にしめしたように、音声信号は通常、高域にいくに従ってパワーが減衰する。そのため、雑音がある程度大きい場合に、一つの閾値(T)で全帯域の雑音成分を除去しようとすると、雑音信号の低域成分を除去するために閾値Tを高めに設定することになり、その結果、高域の目的信号まで減衰させてしまう。よって信号を複数個(例えば3個)のバンドに分割し、各バンドで適した閾値(T1,T2,T3)を閾値算出手段8で算出する。バンドの分割方法として例えば、平均的な音声信号のフォルマント周波数(第一フォルマント周波数f1、第二フォルマント周波数f2、第三フォルマント周波数f3)を用いて、f2以下の帯域を第一バンド、f2以上f3未満の帯域を第二バンド、f3以上の帯域を第三バンドとする。 A specific method will be described below. As shown in FIG. 5, the power of an audio signal is usually attenuated as it goes up. For this reason, when the noise is large to some extent, if an attempt is made to remove the noise component of the entire band with one threshold (T), the threshold T is set higher to remove the low frequency component of the noise signal. As a result, the target signal in the high range is attenuated. Therefore, the signal is divided into a plurality of (for example, three) bands, and threshold values (T1, T2, T3) suitable for each band are calculated by the threshold value calculation means 8. As a band dividing method, for example, the formant frequency (first formant frequency f1, second formant frequency f2, third formant frequency f3) of an average audio signal is used, and the band below f2 is set to the first band, and f2 to f3. The lower band is the second band, and the band above f3 is the third band.
各バンドにおける閾値(T1,T2,T3)の算出方法を、T1を例に挙げて述べる。第一バンドにおいて、受音信号のうち最も大きなパワーを持つ周波数成分X(ωMax1)を選定する。この帯域X(ωMax1)は、目的音信号の成分である可能性が高いと判断できる。よって、X(ωMax1)のパワー20log10|X(ωMax1)|を算出し、そのパワー値より例えば20dB小さい値(他の値(10dB,15dBなど)でもよい)を閾値T1とする。すなわち、T1=20log10|X(ωMax1)|-20とする。こうすることで、第一バンドの中で、最大のパワーを持つ周波数成分に比べて20dB以上小さくなる信号成分は雑音成分と判定されて抑圧される。 A method of calculating threshold values (T1, T2, T3) in each band will be described by taking T1 as an example. In the first band, the frequency component X (ω Max1 ) having the largest power among the received signals is selected. This band X (ω Max1 ) can be determined to be highly likely to be a component of the target sound signal. Therefore, the power 20log 10 | X (ω Max1 ) | of X (ω Max1 ) is calculated, and a value that is, for example, 20 dB smaller than the power value (other values (10 dB, 15 dB, etc.) may be used as the threshold value T1. That is, T1 = 20log 10 | X (ω Max1 ) | −20. In this way, a signal component that is 20 dB or more smaller than the frequency component having the maximum power in the first band is determined as a noise component and suppressed.
閾値T2についても同様に、第二バンドのなかで最もパワーが大きい周波数成分20log10|X(ωMax2)|のパワーを算出し、閾値T2を、T2=20log10|X(ωMax1)|-20と設定する。閾値T3についても同様である。以上の方法により、閾値算出手段8は帯域毎に適した閾値を求める。その算出結果を帯域別信号判定手段4に入力する。帯域別信号判定手段4は各バンド毎に算出した閾値を利用して各帯域信号の属性を判定するから、雑音信号がある程度大きくなった場合でも、請求項6に比べて、帯域毎の雑音成分を精度よく判定することができる。以上の説明により、遠方からの雑音信号が混在した受音信号に対して、目的信号を抽出できることが理解できよう。 Similarly, for the threshold T2, the power of the frequency component 20log 10 | X (ω Max2 ) | having the largest power in the second band is calculated, and the threshold T2 is calculated as T2 = 20log 10 | X (ω Max1 ) |- Set to 20. The same applies to the threshold value T3. With the above method, the threshold value calculation means 8 obtains a threshold value suitable for each band. The calculation result is input to the band-specific signal determination means 4. The band-specific signal determination means 4 determines the attribute of each band signal using the threshold value calculated for each band. Therefore, even when the noise signal is increased to some extent, the noise component for each band is compared with the sixth aspect. Can be accurately determined. From the above description, it can be understood that a target signal can be extracted from a received sound signal in which a noise signal from a distance is mixed.
上述した各実施例で説明した帯域分割手段2、帯域別特徴量算出手段3、帯域別信号判定手段4、帯域別重み値決定手段5、帯域別重み値乗算手段6、信号合成手段7、閾値算出手段8はそれぞれ、コンピュータが解読可能なプログラム言語によって記述された近接音分散プログラムをコンピュータにインストールし、コンピュータに備えたCPUに解読させて実行することによりコンピュータより機能させることができ、結果として近接音分離収音装置として機能させることができる。近接音分離収音プログラムはコンピュータが読み取り可能な磁気ディスク或はCD−ROMのような記録媒体に記録され、これらの記録媒体からコンピュータにインストールするか或は通信回線を通じてインストールすることができる。
The band dividing means 2, the band-specific feature amount calculating means 3, the band-specific signal determining means 4, the band-specific weight
この発明による近接音分離収音装置は例えばハンズフリー方式の音声会議システム等に活用される。 The proximity sound separating and collecting apparatus according to the present invention is utilized in, for example, a hands-free audio conference system.
N 雑音源 4 帯域別信号判定手段
M 目的音源 5 帯域別重み値決定手段
n(t) 雑音信号 6 帯域別重み値乗算手段
s(t) 目的音信号 7 信号合成手段
1 入力手段 8 閾値算出手段
2 帯域分割手段
3 帯域別特徴量算出手段
N Noise source 4 Band-specific signal determination means
M target
1 Input means 8 Threshold value calculation means
2 Band division means
3 Band-specific feature value calculation means
Claims (6)
前記音声入力手段の各出力信号を音声帯域内において複数の帯域信号に分割する帯域分割処理と、
前記各帯域信号の音源特徴量を算出する帯域別特徴量算出処理と、
前記帯域別特徴量算出処理で算出された各帯域別の特徴量に基づき、目的音源の信号を主成分とする信号か、雑音源の信号を主成分とする信号であるかを判定する帯域別信号判定処理と、
前記帯域別信号判定処理で判定した判定結果に基づいて、前記各帯域別に重み値を決定する帯域別重み値決定処理と、
前記帯域別重み値決定処理で決定された重み値を前記各帯域信号に乗算する帯域別重み値乗算処理と、
前記帯域別重み値乗算処理で重み付けされた信号を時間波形に戻す信号合成処理と、
を含み、
前記帯域別特徴量算出処理は各帯域信号の特徴量として時間領域でのピークの度合いを示す尖鋭度を算出し、前記帯域別信号判定処理は各帯域信号の尖鋭度が予め設定した閾値以上を前記目的音信号を主成分とする帯域信号と判定し、各帯域信号の尖鋭度が閾値以下を雑音を主成分とする帯域信号と判定することを特徴とする近接音分離収音方法。 A proximity sound separation and collection method for collecting sound by suppressing noise in an environment where a target sound source and a noise source are present using at least one voice input means, and emphasizing the target sound signal,
Band division processing for dividing each output signal of the voice input means into a plurality of band signals within a voice band;
A band-specific feature amount calculation process for calculating a sound source feature amount of each band signal;
Based on the feature value for each band calculated in the feature value calculation process for each band, it is determined for each band whether the signal is mainly a signal of a target sound source or a signal of a noise source. Signal determination processing;
Based on the determination result determined in the signal determination process for each band, a weight value determination process for each band for determining a weight value for each band;
A band-by-band weight value multiplication process for multiplying each band signal by the weight value determined in the band-by-band weight value determination process;
A signal synthesis process for returning a signal weighted by the weight value multiplication process for each band to a time waveform;
Only including,
The band-specific feature amount calculation processing calculates the sharpness indicating the degree of peak in the time domain as the feature amount of each band signal, and the band-specific signal determination processing determines that the sharpness of each band signal is equal to or greater than a preset threshold value. A proximity sound separation and collection method, wherein the target sound signal is determined to be a band signal having a main component, and the sharpness of each band signal is determined to be a band signal having noise as a main component if the sharpness of each band signal is equal to or less than a threshold value .
前記音声入力手段の各出力信号を音声帯域内において複数の帯域信号に分割する帯域分割手段と、
前記帯域信号の音響特徴量を算出する帯域別特徴量算出手段と、
前記帯域別特徴量算出手段で算出された各帯域別の特徴量に基づき、目的音源の信号を主成分とする信号であるかを判定する帯域別信号判定手段と、
前記帯域別信号判定手段で判定した判定結果に基づいて、前記帯域別に重み値を決定する帯域別重み値決定手段と、
前記帯域別重み値決定手段で決定された重み値を前記各帯域信号に乗算する帯域別重み値乗算手段と、
前記帯域別重み値乗算手段で重み付けされた信号を時間波形に戻す信号合成手段と、
を備える近接音分離収音装置において、
前記帯域別特徴量算出手段は各帯域信号の特徴量として時間領域でのピークの度合いを示す尖鋭度を算出し、前記帯域別信号判定手段は各帯域信号の尖鋭度が予め設定した閾値以上を前記目的音信号を主成分とする帯域信号と判定し、各帯域信号の尖鋭度が閾値以下を雑音を主成分とする帯域信号と判定することを特徴とする近接音分離収音装置。 A proximity sound separating and collecting apparatus that suppresses noise in an environment where a target sound source and a noise source exist using at least one voice input unit and emphasizes the target sound signal to collect sound,
Band dividing means for dividing each output signal of the voice input means into a plurality of band signals within a voice band;
A band-specific feature amount calculating means for calculating an acoustic feature amount of the band signal;
Band-specific signal determination means for determining whether the signal is a signal mainly composed of the signal of the target sound source based on the characteristic value for each band calculated by the band-specific feature value calculation means;
Based on the determination result determined by the band-specific signal determination unit, a band-specific weight value determination unit that determines a weight value for each band;
Weight-by-band weight value multiplying means for multiplying each band signal by the weight value determined by the weight-by-band weight value determining means;
Signal synthesis means for returning the signal weighted by the weight-by-band weight value multiplication means to a time waveform;
In the near Se'on separating and collecting apparatus for Ru provided with,
The band-specific feature amount calculation means calculates the sharpness indicating the degree of peak in the time domain as the feature amount of each band signal, and the band-specific signal determination means determines that the sharpness of each band signal is greater than or equal to a preset threshold value. A proximity sound separating and collecting apparatus, wherein the target sound signal is determined to be a band signal having a main component, and the sharpness of each band signal is determined to be a band signal having noise as a main component if the threshold is not more than a threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004373810A JP4533126B2 (en) | 2004-12-24 | 2004-12-24 | Proximity sound separation / collection method, proximity sound separation / collection device, proximity sound separation / collection program, recording medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004373810A JP4533126B2 (en) | 2004-12-24 | 2004-12-24 | Proximity sound separation / collection method, proximity sound separation / collection device, proximity sound separation / collection program, recording medium |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2006178333A JP2006178333A (en) | 2006-07-06 |
JP4533126B2 true JP4533126B2 (en) | 2010-09-01 |
Family
ID=36732491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004373810A Expired - Fee Related JP4533126B2 (en) | 2004-12-24 | 2004-12-24 | Proximity sound separation / collection method, proximity sound separation / collection device, proximity sound separation / collection program, recording medium |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP4533126B2 (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2007306366B2 (en) | 2006-10-10 | 2011-03-10 | Sivantos Gmbh | Method for operating a hearing aid, and hearing aid |
AU2007306432B2 (en) * | 2006-10-10 | 2012-03-29 | Sivantos Gmbh | Method for operating a hearing aid, and hearing aid |
JP2010193323A (en) * | 2009-02-19 | 2010-09-02 | Casio Hitachi Mobile Communications Co Ltd | Sound recorder, reproduction device, sound recording method, reproduction method, and computer program |
EP2546831B1 (en) * | 2010-03-09 | 2020-01-15 | Mitsubishi Electric Corporation | Noise suppression device |
JP6064600B2 (en) * | 2010-11-25 | 2017-01-25 | 日本電気株式会社 | Signal processing apparatus, signal processing method, and signal processing program |
JP5643686B2 (en) * | 2011-03-11 | 2014-12-17 | 株式会社東芝 | Voice discrimination device, voice discrimination method, and voice discrimination program |
JP6018141B2 (en) * | 2014-08-14 | 2016-11-02 | 株式会社ピー・ソフトハウス | Audio signal processing apparatus, audio signal processing method, and audio signal processing program |
JP7182997B2 (en) * | 2018-11-08 | 2022-12-05 | 東京瓦斯株式会社 | picture book display system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57104194A (en) * | 1980-12-19 | 1982-06-29 | Sony Corp | Voice/silence discriminator |
JPS61286900A (en) * | 1985-06-14 | 1986-12-17 | ソニー株式会社 | Signal processor |
JPH03122699A (en) * | 1989-10-05 | 1991-05-24 | Ricoh Co Ltd | Noise removing device and voice recognition device using same device |
JPH09171397A (en) * | 1995-12-20 | 1997-06-30 | Oki Electric Ind Co Ltd | Background noise eliminating device |
JP2002140100A (en) * | 2000-11-02 | 2002-05-17 | Matsushita Electric Ind Co Ltd | Noise suppressing device |
JP2002169599A (en) * | 2000-11-30 | 2002-06-14 | Toshiba Corp | Noise suppressing method and electronic equipment |
JP2004341339A (en) * | 2003-05-16 | 2004-12-02 | Mitsubishi Electric Corp | Noise restriction device |
-
2004
- 2004-12-24 JP JP2004373810A patent/JP4533126B2/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57104194A (en) * | 1980-12-19 | 1982-06-29 | Sony Corp | Voice/silence discriminator |
JPS61286900A (en) * | 1985-06-14 | 1986-12-17 | ソニー株式会社 | Signal processor |
JPH03122699A (en) * | 1989-10-05 | 1991-05-24 | Ricoh Co Ltd | Noise removing device and voice recognition device using same device |
JPH09171397A (en) * | 1995-12-20 | 1997-06-30 | Oki Electric Ind Co Ltd | Background noise eliminating device |
JP2002140100A (en) * | 2000-11-02 | 2002-05-17 | Matsushita Electric Ind Co Ltd | Noise suppressing device |
JP2002169599A (en) * | 2000-11-30 | 2002-06-14 | Toshiba Corp | Noise suppressing method and electronic equipment |
JP2004341339A (en) * | 2003-05-16 | 2004-12-02 | Mitsubishi Electric Corp | Noise restriction device |
Also Published As
Publication number | Publication date |
---|---|
JP2006178333A (en) | 2006-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5641186B2 (en) | Noise suppression device and program | |
JP5528538B2 (en) | Noise suppressor | |
JP5127754B2 (en) | Signal processing device | |
US9854358B2 (en) | System and method for mitigating audio feedback | |
US20080140396A1 (en) | Model-based signal enhancement system | |
JP6019969B2 (en) | Sound processor | |
US20140337021A1 (en) | Systems and methods for noise characteristic dependent speech enhancement | |
US20120185246A1 (en) | Noise suppression using multiple sensors of a communication device | |
JP2013518477A (en) | Adaptive noise suppression by level cue | |
JP6174856B2 (en) | Noise suppression device, control method thereof, and program | |
WO2015129760A1 (en) | Signal-processing device, method, and program | |
JP4448464B2 (en) | Noise reduction method, apparatus, program, and recording medium | |
US20170251319A1 (en) | Method and apparatus for synthesizing separated sound source | |
KR20110021419A (en) | Apparatus and method for reducing noise in the complex spectrum | |
JP4533126B2 (en) | Proximity sound separation / collection method, proximity sound separation / collection device, proximity sound separation / collection program, recording medium | |
JP5443547B2 (en) | Signal processing device | |
JP5915281B2 (en) | Sound processor | |
JP2000081900A (en) | Sound absorbing method, and device and program recording medium therefor | |
JP5609157B2 (en) | Coefficient setting device and noise suppression device | |
JP2003271166A (en) | Input signal processing method and input signal processor | |
US9210507B2 (en) | Microphone hiss mitigation | |
JP6790659B2 (en) | Sound processing equipment and sound processing method | |
JP2015169901A (en) | Acoustic processing device | |
JP5316127B2 (en) | Sound processing apparatus and program | |
KR101537653B1 (en) | Method and system for noise reduction based on spectral and temporal correlations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
RD03 | Notification of appointment of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7423 Effective date: 20061225 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20070126 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20091124 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20091208 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20100203 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20100601 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20100611 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20130618 Year of fee payment: 3 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20140618 Year of fee payment: 4 |
|
LAPS | Cancellation because of no payment of annual fees |