JP2008135933A - Voice emphasizing processing system - Google Patents

Voice emphasizing processing system Download PDF

Info

Publication number
JP2008135933A
JP2008135933A JP2006320101A JP2006320101A JP2008135933A JP 2008135933 A JP2008135933 A JP 2008135933A JP 2006320101 A JP2006320101 A JP 2006320101A JP 2006320101 A JP2006320101 A JP 2006320101A JP 2008135933 A JP2008135933 A JP 2008135933A
Authority
JP
Japan
Prior art keywords
frequency band
sound source
source
target sound
processing system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2006320101A
Other languages
Japanese (ja)
Inventor
Yoichi Suzuki
陽一 鈴木
Shuichi Sakamoto
修一 坂本
Junfeng Li
軍鋒 李
Satoru Hongo
哲 本郷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tohoku University NUC
Institute of National Colleges of Technologies Japan
Original Assignee
Tohoku University NUC
Institute of National Colleges of Technologies Japan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tohoku University NUC, Institute of National Colleges of Technologies Japan filed Critical Tohoku University NUC
Priority to JP2006320101A priority Critical patent/JP2008135933A/en
Publication of JP2008135933A publication Critical patent/JP2008135933A/en
Pending legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice emphasizing processing system extracting a target sound-source signal existing in the front direction from observed mixed signals (a plurality of sound-source signals). <P>SOLUTION: The voice emphasizing processing system has a signal sound-obtaining means 101 inputting acoustic signals generated from a plurality of sound sources from left-right both receiving sections and a signal conversion means 102 dividing input left-right 2 input signals at every frequency band and obtaining the frequency-band components of the left-right 2 input signals. The voice emphasizing processing system further has a sound-source identifying means 103 identifying a target sound source by using a coherence component acquired from the frequency-band components and an interaural level difference (ILD) acquired from the frequency-band components and a noise-source identifying means 104 identifying a noise source from the frequency-band components. The voice emphasizing processing system further has a sound-source extracting means 105 extracting the target sound sources at every frequency band on the basis of the target sound sources obtained by the sound-source identifying means and the noise source obtained by the noise-source identifying means. The target sound sources can be separated and emphasized excellently by a technique higher than a conventional technique. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、観測された混合信号(複数の音源)から正面方向に存在するターゲット音源信号を抽出する技術に関するものである。   The present invention relates to a technique for extracting a target sound source signal existing in the front direction from observed mixed signals (a plurality of sound sources).

近年、独立成分分析(ICA: Independent Component Analysis)に基づく音源分離法として、ブラインド音源分離(BBS: Blind Source Separation)や適応型ビームフォーマ(ABF: Adaptive Beamformer)などの手法が研究されている。   In recent years, methods such as blind source separation (BBS) and adaptive beamformer (ABF) have been studied as sound source separation methods based on independent component analysis (ICA).

また広周波数帯域での音源分離を行う両耳聴モデルとして、両耳間レベル差および両耳間位相差を用いた周波数領域両耳聴モデル(FDBM:Frequency Domain Binaural Model)が提案されている。このFDBMは、低周波数領域の信号に対しては、両耳間位相差を用い、高周波数領域の信号に対しては、両耳間レベル差を用いることによって音源の到来方向を推定する。得られた推定結果に基づき、特定方向の音源のみをフィルタリングにより分離するため、広周波数帯域での音源方向の推定精度が改善され、また分離性能が改善されている。しかしながら、この方法における音源方向の推定は、仰角が0°、すなわち水平面について推定するものであり、仰角がある音源方向を判定しようとすると、両耳間の位相差、レベル差が等しい点がいくつか空間上に存在するため、方向の推定は出来ないという問題が残っていた。   In addition, as a binaural model for sound source separation in a wide frequency band, a frequency domain binaural model (FDBM: Frequency Domain Binaural Model) using interaural level difference and interaural phase difference has been proposed. This FDBM uses the binaural phase difference for signals in the low frequency region and estimates the direction of arrival of the sound source by using the interaural level difference for signals in the high frequency region. Since only the sound source in a specific direction is separated by filtering based on the obtained estimation result, the estimation accuracy of the sound source direction in a wide frequency band is improved and the separation performance is improved. However, in this method, the direction of the sound source is estimated with respect to an elevation angle of 0 °, that is, with respect to the horizontal plane. However, because it exists in space, the problem that the direction could not be estimated remained.

特許文献1では、この問題を解決するために、上下に存在する複数の音源から特定の音響信号を分離するシステムを提示している。このシステムは、複数の音源から発生される音響信号を左右両受音部から入力する手段と、左右両入力信号を周波数帯域ごとに分割する手段と、左右両入力信号のクロススペクトルから周波数帯域ごとの両耳間位相差(IPD)、パワースペクトルのレベル差から両耳間レベル差(ILD)を求める手段と、全周波数帯域で、IPD及び/またはILDと、データベースのそれとを比較することにより各周波数帯域ごとに音源方向の候補を推定する手段と、上記の各周波数帯域ごとに得られた音源方向のうち出現頻度が高い方向を音源方向と推定する手段と、上記より推定された音源方向情報をもとに、特定音源方向の周波数帯域を主として抽出することにより音源を分離する手段とを備え、2入力システムにより複数の音源から特定の音響信号を分離することが可能となり、さらに方位角と仰角を有する音源を分離することが可能としている。   In order to solve this problem, Patent Document 1 presents a system that separates a specific acoustic signal from a plurality of sound sources that exist above and below. This system includes means for inputting sound signals generated from a plurality of sound sources from both the left and right sound receiving units, means for dividing the left and right input signals for each frequency band, and for each frequency band from the cross spectrum of both left and right input signals. By comparing the interaural phase difference (IPD), the means for obtaining the interaural level difference (ILD) from the level difference of the power spectrum, and comparing the IPD and / or ILD with that of the database in all frequency bands Means for estimating sound source direction candidates for each frequency band; means for estimating the direction of high appearance frequency among the sound source directions obtained for each frequency band; and sound source direction information estimated from the above And a means for separating the sound source by mainly extracting the frequency band in the direction of the specific sound source, and it is possible to separate a specific acoustic signal from a plurality of sound sources by a two-input system. In addition, it is possible to separate sound sources having an azimuth angle and an elevation angle.

特開2004−325284JP 2004-325284 A

しかしながら、従来の方法では、特に反射音が存在する環境や、周囲に非定常/定常騒音源が多数あるような環境では、音源分離の精度が向上しないという問題点がある。   However, the conventional method has a problem in that the accuracy of sound source separation is not improved particularly in an environment where reflected sound exists or an environment where there are many unsteady / steady noise sources in the surroundings.

本発明は、2入力を用いて抽出したい音源と抑制したいノイズ源を同定し、時間−周波数平面上で両者のS/N比を比較して抽出音源を強調することで、ターゲット音源を良好に分離・強調するシステムを提供することを目的とする。   The present invention identifies a sound source to be extracted and a noise source to be suppressed using two inputs, compares the S / N ratio of both on the time-frequency plane, and enhances the extracted sound source, thereby improving the target sound source. The purpose is to provide a system that separates and emphasizes.

上記目的を達成するため、請求項1に記載の音声強調処理システムは、観測された混合信号から正面方向に存在するターゲット音源信号を抽出する音声強調処理システムであって、複数の音源から発生される音響信号を左右両受信部から入力する信号収音手段と、入力した左右2入力信号を周波数帯域毎に分割して左右2入力信号の周波数帯域成分を求める信号変換手段と、前記周波数帯域成分から求めたコヒーレンス成分と前記周波数帯域成分から求めた両耳間レベル差(ILD) とに基づき、ターゲット音源を同定する音源同定手段と、前記周波数帯域成分からノイズ源を同定するノイズ源同定手段と、前記音源同定手段で求めたターゲット音源と、前記ノイズ源同定手段で求めたノイズ源とに基づき、周波数帯域毎にターゲット音源を抽出する音源抽出手段と、を備えたことを特徴とする。   In order to achieve the above object, a speech enhancement processing system according to claim 1 is a speech enhancement processing system that extracts a target sound source signal existing in a front direction from an observed mixed signal, and is generated from a plurality of sound sources. Sound collecting means for inputting an acoustic signal to be received from both the left and right receivers, signal converting means for dividing the input left and right two input signals into frequency bands to obtain frequency band components of the left and right two input signals, and the frequency band components A sound source identifying means for identifying a target sound source based on a coherence component obtained from the above and a binaural level difference (ILD) obtained from the frequency band component; and a noise source identifying means for identifying a noise source from the frequency band component; The target sound source is extracted for each frequency band based on the target sound source obtained by the sound source identification means and the noise source obtained by the noise source identification means. Characterized by comprising a signal extraction means.

請求項2に記載の音声強調処理システムは、前記音源同定手段が、前記周波数帯域成分からコヒーレンス関数を用いてコヒーレンス成分の実数部(RealCoh)を求める手段と、前記周波数帯域成分から両耳間の音圧レベル差(ILD)を求める手段と、周波数帯域毎のRealCoh値とILD値とに基づき予め設定された条件式を用いて周波数帯域毎のターゲット音源の有無(SAPCohILD)を導出する手段とを備えたことを特徴とする。 The speech enhancement processing system according to claim 2, wherein the sound source identification means obtains a real part (RealCoh) of a coherence component from the frequency band component using a coherence function; Means for obtaining a sound pressure level difference (ILD), means for deriving the presence / absence of a target sound source for each frequency band (SAP CohILD ) using a preset conditional expression based on the RealCoh value and ILD value for each frequency band, and It is provided with.

請求項3に記載の音声強調処理システムは、前記周波数帯域成分から周波数帯域毎のノイズ比を算出し、その値と予め設定された指標値とを比較することにより周波数帯域毎のノイズ源の有無(SAPNOR)を導出する手段とを備えたことを特徴とする。 The speech enhancement processing system according to claim 3, wherein a noise ratio for each frequency band is calculated from the frequency band component, and the presence / absence of a noise source for each frequency band is compared by comparing the calculated value with a preset index value. And a means for deriving (SAP NOR ).

請求項4に記載の音声強調処理システムは、前記音源同定手段で求めた周波数帯域毎のSAPCohILD値と、前記ノイズ源同定手段で求めたSAPNOR値とに基づき周波数帯域毎のターゲット音源の有無(SAPSNR)を導出することにより周波数帯域毎にターゲット音源を抽出する手段を備えたことを特徴とする。 5. The speech enhancement processing system according to claim 4, wherein the presence or absence of a target sound source for each frequency band based on the SAP CohILD value for each frequency band obtained by the sound source identification means and the SAP NOR value obtained by the noise source identification means. A means for extracting a target sound source for each frequency band by deriving (SAP SNR ) is provided.

請求項5に記載の音声強調処理システムは、前記音源抽出手段が、周波数帯域毎のS/N比を算出して、その値と予め設定された指標値とを比較することにより周波数帯域毎のターゲット音源の有無(SAPSNR)を導出することにより周波数帯域毎にターゲット音源を抽出する手段を備えたことを特徴とする。 The speech enhancement processing system according to claim 5, wherein the sound source extraction unit calculates an S / N ratio for each frequency band, and compares the value with a preset index value, thereby calculating the S / N ratio for each frequency band. Means is provided for extracting a target sound source for each frequency band by deriving the presence or absence of a target sound source (SAP SNR ).

請求項1に係る発明によれば、左右2入力信号間の周波数帯域成分からコヒーレンス成分の実数部(RealCoh)と両耳間レベル差(ILD)とを求めてターゲット音源を同定するとともに、左右2入力信号間の周波数帯域成分からノイズ源を同定することで、周波数帯域毎に正面方向のターゲット音源を同定することができ、従来の手法以上にターゲット音源を良好に分離・強調することが可能になる。特に、反射音が存在するような環境においても効果がある。   According to the first aspect of the invention, the target sound source is identified by obtaining the real part (RealCoh) and interaural level difference (ILD) of the coherence component from the frequency band component between the left and right two input signals, and the left and right 2 By identifying the noise source from the frequency band components between the input signals, it is possible to identify the target sound source in the front direction for each frequency band, making it possible to separate and emphasize the target sound source better than conventional methods Become. In particular, it is also effective in an environment where reflected sound exists.

請求項2に係る発明によれば、周波数帯域毎にターゲット音源の有無を同定することが可能になる。   According to the invention which concerns on Claim 2, it becomes possible to identify the presence or absence of a target sound source for every frequency band.

請求項3に係る発明によれば、周波数帯域毎にノイズ源の有無を同定することが可能になる。   According to the invention which concerns on Claim 3, it becomes possible to identify the presence or absence of a noise source for every frequency band.

請求項4または請求項5に係る発明によれば、音源同定手段で求めたターゲット音源と、ノイズ源同定手段で求めたノイズ源とに基づき、周波数帯域毎にターゲット音源を抽出することで、従来の手法以上にターゲット音源を良好に分離・強調することが可能になる。   According to the invention according to claim 4 or claim 5, by extracting the target sound source for each frequency band based on the target sound source obtained by the sound source identification means and the noise source obtained by the noise source identification means, The target sound source can be separated and emphasized better than this method.

次に、本発明の実施の形態に係る音声強調処理システムについて図面に基づいて説明する。なお、この実施の形態により本発明が限定されるものではない。   Next, a speech enhancement processing system according to an embodiment of the present invention will be described with reference to the drawings. In addition, this invention is not limited by this embodiment.

図1は、本発明の実施の形態に係る音声強調処理システムの構成を示す図である。図1に示すように、音声強調処理システムは、観測された混合信号から正面方向に存在するターゲット音源信号を抽出する音声強調処理システムであって、複数の音源から発生される音響信号を左右両受信部から入力する信号収音手段11と、入力した左右2入力信号を周波数帯域毎に分割して左右2入力信号の周波数帯域成分を求める信号変換手段12と、前記周波数帯域成分から求めたコヒーレンス成分と前記周波数帯域成分から求めた両耳間レベル差(ILD)とを用いてターゲット音源を同定する音源同定手段13と、前記周波数帯域成分からノイズ源を同定するノイズ源同定手段14と、前記音源同定手段で求めたターゲット音源と、前記ノイズ源同定手段で求めたノイズ源とに基づき、周波数帯域毎にターゲット音源を抽出する音源抽出手段15と、を備えている。   FIG. 1 is a diagram showing a configuration of a speech enhancement processing system according to an embodiment of the present invention. As shown in FIG. 1, the speech enhancement processing system is a speech enhancement processing system that extracts a target sound source signal that exists in the front direction from an observed mixed signal. Signal sound collection means 11 input from the receiving unit, signal conversion means 12 for dividing the input left and right two input signals into frequency bands to obtain frequency band components of the left and right two input signals, and coherence obtained from the frequency band components A sound source identifying means 13 for identifying a target sound source using an interaural level difference (ILD) obtained from a component and the frequency band component; a noise source identifying means 14 for identifying a noise source from the frequency band component; Sound source extraction means 15 for extracting a target sound source for each frequency band based on the target sound source obtained by the sound source identification means and the noise source obtained by the noise source identification means, It is provided.

信号収音手段11は、複数の音源から発生される音響空間において、左右両受信部からターゲット音源Sとノイズ源Ni( i:ノイズ源の個数)とが混合された音響信号を受信する。   The signal pickup means 11 receives an acoustic signal in which the target sound source S and the noise source Ni (i: the number of noise sources) are mixed from both the left and right receiving units in an acoustic space generated from a plurality of sound sources.

信号変換手段12は、信号収音手段11で入力した左右2入力信号をそれぞれFFT(Fast Fourier Transform)を用いて周波数帯域毎に分割して左右2入力信号の周波数帯域成分を求める。   The signal conversion means 12 divides the left and right two input signals input by the signal sound pickup means 11 into frequency bands using FFT (Fast Fourier Transform), respectively, and obtains frequency band components of the left and right two input signals.

音源同定手段13は、信号変換手段12で求めた周波数帯域成分からコヒーレンス関数( coherence function )を用いてコヒーレンス成分の実数部(RealCoh)を求める手段Aと、前記周波数帯域成分から両耳間の音圧レベル差(ILD)を求める手段Bと、周波数帯域毎のRealCoh値とILD値とに基づき予め設定された条件式を用いて周波数帯域毎のターゲット音源の有無(SAPCohILD)を導出する手段Cとを有する。 The sound source identification means 13 is a means A for obtaining a real part (RealCoh) of a coherence component using a coherence function from a frequency band component obtained by the signal conversion means 12, and a sound between both ears from the frequency band component. Means B for determining the pressure level difference (ILD) and means C for deriving the presence / absence of the target sound source for each frequency band (SAP CohILD ) using a preset conditional expression based on the RealCoh value and ILD value for each frequency band And have.

手段Aでは、コヒーレンス関数は式(1)で定義され、その実数部は式(2)で求められる。ここで、XLは左側入力信号の周波数帯域成分、XRは右側入力信号の周波数帯域成分、XR *はXRの共役複素数、kは周波数、lは時間を表す。

Figure 2008135933
Figure 2008135933
In means A, the coherence function is defined by equation (1), and its real part is obtained by equation (2). Here, X L is a frequency band component of the left input signal, X R is a frequency band component of the right input signal, X R * is a conjugate complex number of X R , k is a frequency, and l is time.
Figure 2008135933
Figure 2008135933

図2は、実験により全ての周波数帯域で得られたコヒーレンス関数の実数部(RealCoh)を平均化して表したものである。この実験結果から、実数部(RealCoh)は、正面方向からの信号に対しては大きな値(例えばRealCoh ≧0.9)となり、その他の方向からの信号に対しては小さな値(例えばRealCoh ≦0.2)となることがわかる。したがって、この実数部(RealCoh)を用いて、正面方向のターゲット音源と、その他の方向のノイズ源とを分別できることが導き出される。   FIG. 2 is an averaged representation of the real part (RealCoh) of the coherence function obtained in all frequency bands through experiments. From this experimental result, the real part (RealCoh) has a large value (for example, RealCoh ≧ 0.9) for the signal from the front direction and a small value (for example, RealCoh ≦ 0.2) for the signal from the other direction. I understand that Therefore, it is derived that the target sound source in the front direction and the noise source in other directions can be distinguished using the real part (RealCoh).

しかしながら、この実数部(RealCoh)には周期性があり、正面方向以外の信号についても周波数によっては大きな値を有する場合がある。図3に、実験により全ての周波数帯域で得られたコヒーレンス関数の実数部(RealCoh)を表したものを示す。したがって、手段Aだけを用いて、全ての周波数帯域で正面方向のターゲット音源を抽出できるとは限らないことがわかる。   However, the real part (RealCoh) has periodicity, and signals other than the front direction may have a large value depending on the frequency. FIG. 3 shows the real part (RealCoh) of the coherence function obtained in all frequency bands by experiment. Therefore, it can be seen that it is not always possible to extract the target sound source in the front direction in all frequency bands using only the means A.

次に手段Bについて説明する。手段Bでは、左側入力信号の周波数帯域成分XLおよび右側入力信号の周波数帯域成分XRが式(3)で定義され、これを用いて両耳間の音圧レベル差(ILD)を求める。音圧レベル差(ILD)は式(4)で求められる。ここで、HLは音源から左耳までの伝達関数、HRは音源から右耳までの伝達関数、Sは音源を表す。

Figure 2008135933
Figure 2008135933
Next, means B will be described. In the means B, the frequency band component X L of the left input signal and the frequency band component X R of the right input signal are defined by Equation (3), and the sound pressure level difference (ILD) between both ears is obtained using these. The sound pressure level difference (ILD) is obtained by the equation (4). Here, H L is the transfer function of the transfer function from the sound source to the left ear, H R from the sound source to the right ear, S is representative of the sound source.
Figure 2008135933
Figure 2008135933

数式(4)から、両耳間の音圧レベル差(ILD)は、全ての周波数帯域において、正面方向からの信号に対して小さな値となり、その他の方向からの信号に対して大きな値となることが導き出される。   From Equation (4), the sound pressure level difference (ILD) between both ears is a small value for signals from the front direction and a large value for signals from other directions in all frequency bands. It is derived.

次に手段Cについて説明する。手段Cでは、手段Aで求めた周波数帯域毎のRealCoh値と、手段B で求めた周波数帯域毎のILD値とに基づき、予め設定された条件式を用いて周波数帯域毎のターゲット音源の有無(SAPCohILD)を導出する。SAPCohILD値は式(5)’により求められる。SAPCohILD値を用いて式(5)により周波数帯域毎にターゲット音源SL 、SRを求める。ここで、SLは左側入力信号の周波数帯域成分XLから求めるものであり、SRは右側入力信号の周波数帯域成分XRから求めるものである。

Figure 2008135933
Figure 2008135933
Next, means C will be described. In the means C, based on the RealCoh value for each frequency band obtained in the means A and the ILD value for each frequency band obtained in the means B, the presence / absence of a target sound source for each frequency band using a preset conditional expression ( SAP CohILD ) is derived. The SAP CohILD value is obtained by equation (5) ′. Using the SAP CohILD value, the target sound sources S L and S R are obtained for each frequency band according to Equation (5). Here, S L are those obtained from the frequency band components X L of the left input signal, S R are those obtained from the frequency band components X R of the right input signal.
Figure 2008135933
Figure 2008135933

式(5)’において、T1 、T2 、P1 、P2 は予め設定された閾値である。式(5)’から、RealCoh値が閾値T1より大きく、且つILD値が閾値P1より小さい場合は、SAPCohILD値が1に設定され、RealCoh値が閾値T2より小さく、且つILD値が閾値P2より大きい場合は、SAPCohILD値が0に設定され、その他の場合は補完処理が行われ、SAPCohILD値として0と1の間の数値が設定される(例えば、0.2 or 0.5 or 0.8)。 In Expression (5) ′, T 1 , T 2 , P 1 , and P 2 are preset threshold values. From Equation (5) ′, when the RealCoh value is larger than the threshold T 1 and the ILD value is smaller than the threshold P 1 , the SAP CohILD value is set to 1, the RealCoh value is smaller than the threshold T 2 and the ILD value is If the threshold P 2 greater than, SAP CohILD value is set to 0, otherwise complementary processing is performed, number is set between 0 and 1 as the SAP CohILD value (e.g., 0.2 or 0.5 or 0.8 ).

次にノイズ源同定手段14について説明する。ノイズ源同定手段14は、信号変換手段12で求めた周波数帯域成分から周波数帯域毎のノイズ比率を算出し、その値と予め設定された指標値とを比較することにより周波数帯域毎のノイズ源の有無(SAPNOR)を導出する。
ここで、ノイズ比率を算出には従来の手法を用いる。従来の手法として、遅延和アレイを用いる手法やRoman’s Algorithmなどがある。
Next, the noise source identification means 14 will be described. The noise source identification unit 14 calculates a noise ratio for each frequency band from the frequency band component obtained by the signal conversion unit 12, and compares the value with a preset index value to determine the noise source for each frequency band. The presence / absence (SAP NOR ) is derived.
Here, a conventional method is used to calculate the noise ratio. Conventional methods include a method using a delay-and-sum array and a Roman's Algorithm.

遅延和アレイを用いる手法は、マイクロホンの受信信号に適切な遅延を与えることで目的音成分を同相化(位相を揃えること)して目的音を強調する手法であり、ノイズ源同定手段14では、正面方向の音源を抽出するため、遅延が0の条件で左側入力信号の周波数帯域成分XLと右側入力信号の周波数帯域成分XRとの差分Z( = XL−XR)を算出し、式(6)によりノイズ比率OIR(w,t)を求める。ここでY1にはXLまたはXRを用いる。

Figure 2008135933
The method using the delay sum array is a method of emphasizing the target sound by making the target sound component in-phase (aligning the phase) by giving an appropriate delay to the received signal of the microphone. In the noise source identifying means 14, In order to extract the sound source in the front direction, the difference Z (= X L −X R ) between the frequency band component X L of the left input signal and the frequency band component X R of the right input signal is calculated under the condition of zero delay, The noise ratio OIR (w, t) is obtained from Equation (6). Here, X L or X R is used for Y 1 .
Figure 2008135933

またRoman’s Algorithmについて、図4にそのアルゴリズムを示す。この手法では、適応フィルタWを用いて、式(6)によりノイズ比率OIR(w,t)を求める。   FIG. 4 shows the algorithm of the Roman's algorithm. In this method, the noise ratio OIR (w, t) is obtained by using the adaptive filter W according to Equation (6).

式(6)により求めたノイズ比率OIR(w,t)と予め設定された指標値とを比較することにより周波数帯域毎のノイズ源の有無(SAPNOR)を導出する。指標値として−6dBを設定した場合の例を式(7)に示す。

Figure 2008135933
The presence / absence of a noise source for each frequency band (SAP NOR ) is derived by comparing the noise ratio OIR (w, t) obtained by Equation (6) with a preset index value. Formula (7) shows an example when -6dB is set as the index value.
Figure 2008135933

式(7)から、ノイズ比率OIR(w,t)が-6dBより大きい場合にはSAPNOR値が0に設定され、その他の場合はSAPNOR値が1に設定される。 From Equation (7), the SAP NOR value is set to 0 when the noise ratio OIR (w, t) is greater than −6 dB, and the SAP NOR value is set to 1 in other cases.

次に音源抽出手段15について説明する。音源抽出手段15は、周波数帯域毎のSAPCohILD値とSAPNOR値とに基づき周波数帯域毎のターゲット音源の有無(SAPSNR)を導出することにより周波数帯域毎にターゲット音源を抽出する手段Dと、周波数帯域毎のS/N比を算出して、その値と予め設定された指標値とを比較することにより周波数帯域毎のターゲット音源の有無(SAPSNR)を導出することにより周波数帯域毎にターゲット音源を抽出する手段Eとを有する。 Next, the sound source extraction means 15 will be described. The sound source extraction means 15 is a means D for extracting the target sound source for each frequency band by deriving the presence or absence of the target sound source for each frequency band (SAP SNR ) based on the SAP CohILD value and the SAP NOR value for each frequency band, Calculate the S / N ratio for each frequency band and compare the value with a preset index value to derive the target sound source presence / absence (SAP SNR ) for each frequency band. And means E for extracting a sound source.

手段Dでは、音源同定手段13で求めた周波数帯域毎のSAPCohILD値と、ノイズ源同定手段14で求めた周波数帯域毎のSAPNOR値とを比較し、周波数帯域毎にターゲット音源の有無(SAPSNR)を導出する。ここでは、
・“SAPCohILD値 = 1” and “SAPNOR値 = 0” の場合、SAPSNR値 = 1
・“SAPCohILD値 = 0” and “SAPNOR値 = 1” の場合、SAPSNR値 = 0
に設定する。上記以外の場合には、手段Eを用いてSAPSNR値を導出する。
In means D, the SAP CohILD value for each frequency band obtained by the sound source identification means 13 is compared with the SAP NOR value for each frequency band obtained by the noise source identification means 14, and the presence or absence of the target sound source (SAP SNR ) is derived. here,
・ When "SAP CohILD value = 1" and "SAP NOR value = 0", SAP SNR value = 1
・ When "SAP CohILD value = 0" and "SAP NOR value = 1", SAP SNR value = 0
Set to. In cases other than the above, the SAP SNR value is derived using means E.

手段Eでは、式(8)により周波数帯域毎のS/N比SNR(w,t)を算出して、その値と予め設定された指標値T3とを比較することにより周波数帯域毎のターゲット音源の有無(SAPSNR)を導出する。

Figure 2008135933
In section E, to calculate the S / N ratio SNR for each frequency band (w, t) by the equation (8), a target for each frequency band by comparing the index value T 3 set in advance and the value The presence or absence of a sound source (SAP SNR ) is derived.
Figure 2008135933

式(8)により求めたSNR(w,t)に基づき、SAPSNR値を式(9)により求める。

Figure 2008135933
Based on the SNR (w, t) obtained by the equation (8), the SAP SNR value is obtained by the equation (9).
Figure 2008135933

式(9)から、SNR(w,t)が閾値T3より小さい場合にはSAPSNR値が0に設定され、その他の場合はSAPSNR値が1に設定される。 From equation (9), SAP SNR value is set to 0 if SNR (w, t) is the threshold value T 3 smaller than otherwise SAP SNR value is set to 1.

音源抽出手段15では、上記で説明した手段Dおよび手段Eで求めたSAPSNR値に基づき、式(10)を用いて周波数帯域毎にターゲット音源SL 、SRを求める。ここで、SLは左側入力信号の周波数帯域成分XLから求めるものであり、SRは右側入力信号の周波数帯域成分XRから求めるものである。

Figure 2008135933
The sound source extraction means 15 obtains the target sound sources S L and S R for each frequency band using the equation (10) based on the SAP SNR values obtained by the means D and E described above. Here, S L are those obtained from the frequency band components X L of the left input signal, S R are those obtained from the frequency band components X R of the right input signal.
Figure 2008135933

次に、本発明の音声強調処理システムと、他の従来手法を用いたシステムとの比較実験結果について説明する。ここでは、次の3つの条件で実験を行った結果を図5、及び図6に示す。
(1)ターゲット音源:正面方向、40 sentences
ノイズ源(1個) :60度方向、40 sentences
(2)ターゲット音源:正面方向、40 sentences
ノイズ源(2個) :60度方向&−60度方向、40 sentences
(3)ターゲット音源:正面方向、40 sentences
ノイズ源(3個) :60度方向&−60度方向&30度方向、40 sentences
Next, the results of a comparison experiment between the speech enhancement processing system of the present invention and a system using another conventional method will be described. Here, the results of experiments conducted under the following three conditions are shown in FIGS.
(1) Target sound source: Front direction, 40 sentences
Noise source (1): 60 degree direction, 40 sentences
(2) Target sound source: Front direction, 40 sentences
Noise source (2): 60 degree direction and -60 degree direction, 40 sentences
(3) Target sound source: Front direction, 40 sentences
Noise sources (3): 60 degree direction & -60 degree direction & 30 degree direction, 40 sentences

図5は、それぞれのシステムにおけるS/N比を示したものであり、図6は、歪み度を示したものである。図5、及び図6の凡例において、FDBM、Romanが従来手法の結果を示すものであり、CohILDが本発明において音源同定手段13で求めたターゲット音源の有無(SAPCohILD)を用いた場合(中間段階)の結果を示すものであり、CohAndBFが本発明による結果を示すものである。 FIG. 5 shows the S / N ratio in each system, and FIG. 6 shows the degree of distortion. In the legends of FIG. 5 and FIG. 6, FDBM and Roman show the results of the conventional method, and CohILD uses the presence / absence of the target sound source (SAP CohILD ) obtained by the sound source identification means 13 in the present invention (intermediate) Stage) results, and CohAndBF shows the results according to the present invention.

図5から、上記3条件において、本発明のシステムは、他のシステムと比べて、高いS/N比を得ることができることが実証された。また図6から、上記3条件において、本発明のシステムは、他のシステムと比べて、歪み度が少ないことが実証された。
以上から、本発明のシステムは、従来手法以上に、ターゲット音源を良好に分離・強調することが可能になることが実証された。
FIG. 5 demonstrates that the system of the present invention can obtain a higher S / N ratio than the other systems under the above three conditions. Moreover, from FIG. 6, it was proved that the system of the present invention has less distortion than the other systems under the above three conditions.
From the above, it has been proved that the system of the present invention can separate and emphasize the target sound source better than the conventional method.

本発明の音声強調処理システムの構成を示す図である。It is a figure which shows the structure of the audio | voice emphasis processing system of this invention. 実験により全ての周波数帯域で得られたコヒーレンス関数の実数部(RealCoh)を平均化して表した図である。It is the figure which averaged and represented the real part (RealCoh) of the coherence function obtained in all the frequency bands by experiment. 実験により全ての周波数帯域で得られたコヒーレンス関数の実数部(RealCoh)を表した図である。It is a figure showing the real part (RealCoh) of the coherence function obtained in all the frequency bands by experiment. ノイズ比率を算出する手法としてRoman’s Algorithmを示した図である。It is the figure which showed Roman's Algorithm as a method of calculating a noise ratio. S/N比について、本発明の音声強調処理システムと、他の従来手法を用いたシステムとの比較実験結果を示した図である。It is the figure which showed the comparison experiment result of the audio | voice emphasis processing system of this invention, and the system using another conventional method regarding S / N ratio. 歪み度について、本発明の音声強調処理システムと、他の従来手法を用いたシステムとの比較実験結果を示した図である。It is the figure which showed the comparison experiment result of the audio | voice emphasis processing system of this invention, and the system using another conventional method about distortion degree.

符号の説明Explanation of symbols

11 信号収音手段
12 信号変換手段
13 音源同定手段
14 ノイズ源同定手段
15 音源抽出手段
11 Signal sound pickup means 12 Signal conversion means 13 Sound source identification means 14 Noise source identification means 15 Sound source extraction means

Claims (5)

観測された混合信号から正面方向に存在するターゲット音源信号を抽出する音声強調処理システムであって、複数の音源から発生される音響信号を左右両受信部から入力する信号収音手段と、入力した左右2入力信号を周波数帯域毎に分割して左右2入力信号の周波数帯域成分を求める信号変換手段と、前記周波数帯域成分から求めたコヒーレンス成分と前記周波数帯域成分から求めた両耳間レベル差(ILD) とに基づき、ターゲット音源を同定する音源同定手段と、前記周波数帯域成分からノイズ源を同定するノイズ源同定手段と、前記音源同定手段で求めたターゲット音源と前記ノイズ源同定手段で求めたノイズ源とに基づき、周波数帯域毎にターゲット音源を抽出する音源抽出手段と、を備えたことを特徴とする音声強調処理システム。 A speech enhancement processing system for extracting a target sound source signal existing in the front direction from an observed mixed signal, and a signal sound pickup means for inputting acoustic signals generated from a plurality of sound sources from both left and right receiving units, and an input Signal conversion means for dividing the left and right two input signals for each frequency band to obtain the frequency band components of the left and right two input signals; the coherence component obtained from the frequency band component and the interaural level difference obtained from the frequency band component ( ILD) based on the sound source identification means for identifying the target sound source, the noise source identification means for identifying the noise source from the frequency band component, the target sound source obtained by the sound source identification means and the noise source identification means A speech enhancement processing system comprising sound source extraction means for extracting a target sound source for each frequency band based on a noise source. 前記音源同定手段は、前記周波数帯域成分からコヒーレンス関数を用いてコヒーレンス成分の実数部(RealCoh)を求める手段と、前記周波数帯域成分から両耳間の音圧レベル差(ILD)を求める手段と、周波数帯域毎のRealCoh値とILD値とに基づき予め設定された条件式を用いて周波数帯域毎のターゲット音源の有無(SAPCohILD)を導出する手段とを備えたことを特徴とする請求項1に記載の音声強調処理システム。 The sound source identification means includes means for obtaining a real part of a coherence component (RealCoh) using a coherence function from the frequency band component, means for obtaining a sound pressure level difference (ILD) between both ears from the frequency band component, 2. A means for deriving presence / absence of a target sound source for each frequency band (SAP CohILD ) using a preset conditional expression based on a RealCoh value and an ILD value for each frequency band. The speech enhancement processing system described. 前記ノイズ源同定手段は、前記周波数帯域成分から周波数帯域毎のノイズ比を算出し、その値と予め設定された指標値とを比較することにより周波数帯域毎のノイズ源の有無(SAPNOR)を導出する手段とを備えたことを特徴とする請求項1に記載の音声強調処理システム。 The noise source identification means calculates a noise ratio for each frequency band from the frequency band component, and compares the value with a preset index value to determine the presence / absence of a noise source for each frequency band (SAP NOR ). The speech enhancement processing system according to claim 1, further comprising a deriving unit. 前記音源抽出手段は、前記音源同定手段で求めた周波数帯域毎のSAPCohILD値と、前記ノイズ源同定手段で求めたSAPNOR値とに基づき周波数帯域毎のターゲット音源の有無(SAPSNR)を導出することにより周波数帯域毎にターゲット音源を抽出する手段を備えたことを特徴とする請求項1に記載の音声強調処理システム。 The sound source extraction means derives the presence or absence (SAP SNR ) of the target sound source for each frequency band based on the SAP CohILD value for each frequency band obtained by the sound source identification means and the SAP NOR value obtained by the noise source identification means. The speech enhancement processing system according to claim 1, further comprising means for extracting a target sound source for each frequency band. 前記音源抽出手段は、周波数帯域毎のS/N比を算出して、その値と予め設定された指標値とを比較することにより周波数帯域毎のターゲット音源の有無(SAPSNR)を導出することにより周波数帯域毎にターゲット音源を抽出する手段を備えたことを特徴とする請求項1に記載の音声強調処理システム。 The sound source extraction means calculates the S / N ratio for each frequency band, and compares the value with a preset index value to derive the presence or absence (SAP SNR ) of the target sound source for each frequency band. The speech enhancement processing system according to claim 1, further comprising means for extracting a target sound source for each frequency band.
JP2006320101A 2006-11-28 2006-11-28 Voice emphasizing processing system Pending JP2008135933A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006320101A JP2008135933A (en) 2006-11-28 2006-11-28 Voice emphasizing processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006320101A JP2008135933A (en) 2006-11-28 2006-11-28 Voice emphasizing processing system

Publications (1)

Publication Number Publication Date
JP2008135933A true JP2008135933A (en) 2008-06-12

Family

ID=39560474

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006320101A Pending JP2008135933A (en) 2006-11-28 2006-11-28 Voice emphasizing processing system

Country Status (1)

Country Link
JP (1) JP2008135933A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011527025A (en) * 2008-06-30 2011-10-20 オーディエンス,インコーポレイテッド System and method for providing noise suppression utilizing nulling denoising
JP2013125085A (en) * 2011-12-13 2013-06-24 Oki Electric Ind Co Ltd Target sound extraction device and target sound extraction program
US9384753B2 (en) 2010-08-30 2016-07-05 Samsung Electronics Co., Ltd. Sound outputting apparatus and method of controlling the same
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
US9830899B1 (en) 2006-05-25 2017-11-28 Knowles Electronics, Llc Adaptive noise cancellation
JP2018036442A (en) * 2016-08-30 2018-03-08 富士通株式会社 Voice processing program, voice processing method and voice processing device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9830899B1 (en) 2006-05-25 2017-11-28 Knowles Electronics, Llc Adaptive noise cancellation
JP2011527025A (en) * 2008-06-30 2011-10-20 オーディエンス,インコーポレイテッド System and method for providing noise suppression utilizing nulling denoising
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
US9384753B2 (en) 2010-08-30 2016-07-05 Samsung Electronics Co., Ltd. Sound outputting apparatus and method of controlling the same
JP2013125085A (en) * 2011-12-13 2013-06-24 Oki Electric Ind Co Ltd Target sound extraction device and target sound extraction program
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9799330B2 (en) 2014-08-28 2017-10-24 Knowles Electronics, Llc Multi-sourced noise suppression
JP2018036442A (en) * 2016-08-30 2018-03-08 富士通株式会社 Voice processing program, voice processing method and voice processing device

Similar Documents

Publication Publication Date Title
JP2008135933A (en) Voice emphasizing processing system
US9286908B2 (en) Method and system for noise reduction
JP5375400B2 (en) Audio processing apparatus, audio processing method and program
JP4521549B2 (en) A method for separating a plurality of sound sources in the vertical and horizontal directions, and a system therefor
JP6065028B2 (en) Sound collecting apparatus, program and method
JP2011113044A (en) Method, device and program for objective voice extraction
JP4816711B2 (en) Call voice processing apparatus and call voice processing method
CN109817234B (en) Target speech signal enhancement method, system and storage medium based on continuous noise tracking
JP2010112996A (en) Voice processing device, voice processing method and program
CN110706719B (en) Voice extraction method and device, electronic equipment and storage medium
JP6225245B2 (en) Signal processing apparatus, method and program
JP5605574B2 (en) Multi-channel acoustic signal processing method, system and program thereof
JP2007047427A (en) Sound processor
JP6348427B2 (en) Noise removal apparatus and noise removal program
JP6436180B2 (en) Sound collecting apparatus, program and method
JP2006227328A (en) Sound processor
KR20120020527A (en) Apparatus for outputting sound source and method for controlling the same
JP2016163135A (en) Sound collection device, program and method
JP6241520B1 (en) Sound collecting apparatus, program and method
JP6065029B2 (en) Sound collecting apparatus, program and method
JP6106618B2 (en) Speech section detection device, speech recognition device, method thereof, and program
JP2017040752A (en) Voice determining device, method, and program, and voice signal processor
KR101096091B1 (en) Apparatus for Separating Voice and Method for Separating Voice of Single Channel Using the Same
KR20100056859A (en) Voice recognition apparatus and method
JP2001337694A (en) Method for presuming speech source position, method for recognizing speech, and method for emphasizing speech