JP2005257748A - Sound pickup method, sound pickup system, and sound pickup program - Google Patents

Sound pickup method, sound pickup system, and sound pickup program Download PDF

Info

Publication number
JP2005257748A
JP2005257748A JP2004065513A JP2004065513A JP2005257748A JP 2005257748 A JP2005257748 A JP 2005257748A JP 2004065513 A JP2004065513 A JP 2004065513A JP 2004065513 A JP2004065513 A JP 2004065513A JP 2005257748 A JP2005257748 A JP 2005257748A
Authority
JP
Japan
Prior art keywords
sound
signal
original sound
addition rate
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2004065513A
Other languages
Japanese (ja)
Other versions
JP4518817B2 (en
Inventor
Mariko Aoki
真理子 青木
Suehiro Shimauchi
末廣 島内
Kenichi Furuya
賢一 古家
Akitoshi Kataoka
章俊 片岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2004065513A priority Critical patent/JP4518817B2/en
Publication of JP2005257748A publication Critical patent/JP2005257748A/en
Application granted granted Critical
Publication of JP4518817B2 publication Critical patent/JP4518817B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To improve the S/N ratio of a voice signal superposed with noise. <P>SOLUTION: A sound pickup method includes a noise-suppressing processing for suppressing the noise included in an input signal and intensifying a target signal; a voice section degree calculating processing for calculating the voice section likeness in each section of the input signal; an original sound addition rate determining processing for determining the rate for adding an original sound to the target signal suppressing the noise, on the basis of the voice section likeness calculated by the voice section degree calculating processing; and an original sound addition processing for adding the original sound to the target signal suppressing the noise, in accordance with an original sound addition rate determined by the original sound addition rate determining processing. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、雑音抑圧処理した信号に対して、各時間Δtにおける音声区間らしさを算出し、その音声区間らしさに応じて原音付加率を自動的に設定することで、音声品質を保持しながら雑音抑圧を行う収音方法及び収音装置、収音プログラムに関する。   The present invention calculates the likelihood of a speech segment at each time Δt for a signal subjected to noise suppression processing, and automatically sets the original sound addition rate according to the likelihood of the speech segment, thereby maintaining noise while maintaining speech quality. The present invention relates to a sound collection method, a sound collection device, and a sound collection program for performing suppression.

雑音抑圧処理した信号に対し、原音を付加することで処理音の歪を低減させ、聴感上の品質を上げる技術は従来から提案されている(非特許文献1)。
この非特許文献1に開示された原音付加方法は原音付加率を全区間(時間軸方向の全区間)にわたって一定に保つ処理がなされている。
佐々木潤子、羽田陽一、“マスキング効果を考慮した低歪み一入力系雑音低減方法の検討,”音講論,2−3−11,pp.525−526,(1997.3)
A technique has been conventionally proposed in which distortion of a processed sound is reduced by adding an original sound to a signal subjected to noise suppression processing, thereby improving the quality of hearing (Non-Patent Document 1).
In the original sound addition method disclosed in Non-Patent Document 1, a process of keeping the original sound addition rate constant over the entire section (all sections in the time axis direction) is performed.
Junko Sasaki, Yoichi Haneda, “Examination of low distortion single-input noise reduction method considering masking effect,” sound theory, 2-3-11, pp. 525-526, (1997.3)

ところが、例えば雑音が極端に大きな環境(目的音声よりも雑音の方が大きい環境)においては、音声品質を改善するのに十分な原音付加率を常に一定の値で付加すると、雑音も大きく残留してしまい、品質を保ちながらS/N比を改善させることが難しい、という課題があった。
この発明の目的は音声区間の品質を保ちながら時間軸方向の全区間にわたってS/N比を改善することができる収音方法、収音装置、収音プログラムを提案しようとするものである。
However, for example, in an environment where the noise is extremely large (an environment where the noise is larger than the target speech), if the original sound addition rate sufficient to improve speech quality is always added at a constant value, the noise will remain large. Therefore, there is a problem that it is difficult to improve the S / N ratio while maintaining the quality.
An object of the present invention is to propose a sound collection method, a sound collection device, and a sound collection program that can improve the S / N ratio over the entire interval in the time axis direction while maintaining the quality of the voice interval.

本発明は、従来、全区間で一定の原音付加率を用いていた方法に対し、雑音抑圧処理後の信号に対して音声区間らしさを算出し、音声(らしい)区間においては音声品質をなるべく保ちながら、全区間全体でのS/N比改善量を従来に比べて改善させようとするものである。
その具体的な手法として、この発明の第1実施形態として入力信号に含まれる雑音を抑圧し、目的信号を強調する雑音抑圧処理と、入力信号の各区間毎における音声区間らしさを算出する音声区間度算出処理と、音声区間度算出処理で算出された音声区間らしさに基づき雑音抑圧処理された目的信号に原音を付加する率を決定する原音付加率決定処理と、原音付加率決定処理で決定した原音付加率に従って雑音抑圧処理された目的信号に原音を付加する原音付加処理とを含む収音方法を提案する。
The present invention calculates the likelihood of a speech section for a signal after noise suppression processing, compared to the conventional method using a constant original sound addition rate in all sections, and keeps the speech quality as much as possible in the speech (likely) section. However, it is intended to improve the S / N ratio improvement amount in the entire section as compared with the conventional art.
Specifically, as a first embodiment of the present invention, noise suppression processing for suppressing noise included in an input signal and emphasizing a target signal, and a speech section for calculating the likelihood of a speech section for each section of the input signal are described. The original sound addition rate determination processing for determining the rate of adding the original sound to the target signal subjected to the noise suppression processing based on the sound interval characteristic calculated by the voice interval degree calculation processing, and the original sound addition rate determination processing A sound collection method including original sound addition processing for adding original sound to a target signal subjected to noise suppression processing according to the original sound addition rate is proposed.

この第1実施形態において、音声区間度算出処理は入力信号のパワーが所定値以上の区間を音声区間と決定し、所定値以下を雑音区間と決定する収音方法を提案する。
更に、第1実施形態において、音声区間度算出処理は入力信号のパワーが第1設定値TS以上の区間を音声区間と決定し、入力信号のパワーが前記第1設定値より小さい第2設定値TN以下の区間を雑音区間と決定し、入力信号のパワーが前期第1設定値TSと第2設定値TNとの間にある場合には入力信号のパワーに応じて音声らしさを決定する収音方法を提案する。
In this first embodiment, the speech interval degree calculation process proposes a sound collection method in which a section in which the power of the input signal is greater than or equal to a predetermined value is determined as a speech section and a section below the predetermined value is determined as a noise section.
Furthermore, in the first embodiment, the voice interval degree calculation process determines a section in which the power of the input signal is equal to or higher than the first set value TS as a voice section, and the second set value in which the power of the input signal is smaller than the first set value. A section below TN is determined as a noise section, and when the power of the input signal is between the first set value TS and the second set value TN in the previous period, the sound collection is determined according to the power of the input signal. Suggest a method.

更に、この発明の第2実施形態では音声区間度算出処理は、目的音源及び雑音源までの距離が互に異なる距離となる位置に設置された少なくとも一対の音声入力手段を具備し、これら一対の音声入力手段で捉えた雑音信号を含む目的信号をそれぞれチャネル別に帯域分割処理し、帯域分割された各チャネルの帯域成分を同一帯域成分毎にチャネル間でレベル比較し、このレベル比較結果に従って、入力手段と目的音源及び雑音源との距離差に対応付けして音声信号成分か雑音信号成分かを判定し、音声信号成分と判定された周波数帯域成分のパワーを積算し、このパワーの積算値により音声らしさを決定する収音方法を提案する。
更に、この発明の第3実施形態では原音付加率が変化する再に、その変化を序々に変化させる平滑化処理を付加し、これにより原音付加率が急変することを阻止し、音声に歪みを与えることのない収音方法を提案する。
Furthermore, in the second embodiment of the present invention, the speech interval degree calculation processing includes at least a pair of speech input means installed at positions where the distances to the target sound source and the noise source are different from each other. The target signal including the noise signal captured by the voice input means is band-divided for each channel, the band components of each band-divided channel are compared between channels for each same band component, and input according to this level comparison result The voice signal component or the noise signal component is determined in association with the distance difference between the sound source and the target sound source and the noise source, and the power of the frequency band component determined as the voice signal component is integrated. We propose a sound collection method that determines the sound quality.
Furthermore, in the third embodiment of the present invention, a smoothing process for gradually changing the change is added to the change of the original sound addition rate, thereby preventing the original sound addition rate from changing suddenly and distorting the sound. We propose a sound collection method that never gives.

本発明は目的信号と雑音信号が混ざって収音された受音信号に対し、音声区間らしさを表す物理量を算出し、その音声区間らしさに応じて原音付加率を自動的に調整することで、音声区間では歪が少ない音声を、雑音区間ではS/N比改善量が大きい(残留雑音が少ない)信号を出力できる。その結果、従来の全区間一律に原音付加率を定める方法に比べて聞きやすく、雑音の少ない音声信号を出力することができる。   The present invention calculates a physical quantity representing the likelihood of a voice section for a received signal collected by mixing a target signal and a noise signal, and automatically adjusts the original sound addition rate according to the likelihood of the voice section, It is possible to output a voice with less distortion in the voice section and a signal having a large S / N ratio improvement amount (less residual noise) in the noise section. As a result, it is possible to output a voice signal that is easier to hear and has less noise than the conventional method of uniformly determining the addition rate of the original sound in all sections.

以下にこの発明を実施するための最良の形態となる各実施形態について詳細に説明する。
図1はこの発明の第1実施形態を示す。この第1実施形態はこの発明の請求項1で提案する収音方法を利用して動作する収音装置であり、これは請求項6で提案する収音装置に対応する。
この発明の第1実施形態で提案する収音装置100はマイクロホンで構成される音声入力手段1と、雑音抑圧手段2と、音声区間度算出手段3と、原音付加率決定手段4と、原音付加手段5によって構成される。
Each embodiment which is the best mode for carrying out the present invention will be described in detail below.
FIG. 1 shows a first embodiment of the present invention. The first embodiment is a sound collecting device that operates using the sound collecting method proposed in claim 1 of the present invention, and this corresponds to the sound collecting device proposed in claim 6.
The sound collection device 100 proposed in the first embodiment of the present invention includes a voice input means 1 composed of a microphone, a noise suppression means 2, a voice segment degree calculation means 3, an original sound addition rate determination means 4, and an original sound addition. Consists of means 5.

音声入力手段1は目的音源Sと雑音源Nとから目的信号S1(t)と雑音信号S2(t)とを受音する。尚、ここでは説明を簡略化するために雑音源Nを一つとして説明するが、一般に雑音源Nの個数は複数あってもよい。音声入力手段1の出力には目的信号に雑音信号が重畳した信号x(t)(以下原音信号と称す)が出力される。
音声入力手段1が出力する原音信号x(t)は雑音抑圧手段2に入力される。雑音抑圧手段2は一般的な手法、例えばスペクトルサブトラクション等を利用して雑音を抑圧する。雑音抑圧処理された信号をS1′(t)として示す。
雑音抑圧処理された信号S1′(t)は音声区間度算出手段3に供給され、この音声区間度算出手段3で音声区間らしさを表わす「音声区間度」を算出する。
The voice input means 1 receives the target signal S1 (t) and the noise signal S2 (t) from the target sound source S and the noise source N. Here, in order to simplify the description, the description will be made with one noise source N. However, in general, there may be a plurality of noise sources N. A signal x (t) (hereinafter referred to as an original sound signal) in which a noise signal is superimposed on a target signal is output from the audio input means 1.
The original sound signal x (t) output from the voice input unit 1 is input to the noise suppression unit 2. The noise suppression means 2 suppresses noise using a general method, for example, spectral subtraction. The signal subjected to noise suppression processing is shown as S1 ′ (t).
The noise-suppressed signal S1 ′ (t) is supplied to the speech interval degree calculating means 3, and the speech interval degree calculating means 3 calculates the “speech interval degree” representing the likelihood of the speech interval.

音声区間らしさの算出方法としてこの発明の請求項2で提案する収音方法では雑音抑圧された信号S1′(t)のパワー(Pow)を算出し、パワーがある値TSを超えたら音声区間らしい、と判定して音声区間であると決定する。逆に、パワーがTS以下ならば騒音区間であると決定する。
この発明の請求項3で提案する収音方法では、例えば、ある値TSとTNを設定し(TS>TN)、Pow>TSを満たす場合には音声区間らしさが100%であると判定する。TN<Pow<TSの場合には、Powの値がTSに近いほど音声区間らしさも高い、と判定する。例えば(Pow−TN)/(TS−TN)を音声区間らしさを表す量(音声区間度)とする。
In the sound collection method proposed in claim 2 of the present invention as a method for calculating the likelihood of a speech section, the power (Pow) of the noise-suppressed signal S1 ′ (t) is calculated, and if the power exceeds a certain value TS, it seems to be a speech section. , And it is determined that it is a voice section. Conversely, if the power is equal to or less than TS, it is determined to be a noise section.
In the sound collecting method proposed in claim 3 of the present invention, for example, certain values TS and TN are set (TS> TN), and when Pow> TS is satisfied, it is determined that the likelihood of a speech section is 100%. In the case of TN <Pow <TS, it is determined that the closer the value of Pow is to TS, the higher the likelihood of a speech segment is. For example, let (Pow-TN) / (TS-TN) be an amount (sound interval degree) that represents the likelihood of a speech interval.

図2に示したように、音声区間度を算出するために信号のパワーPowを算出する際、雑音抑圧された信号S1′(t)のかわりに原信号x(t)を用いても良い。図2の構成のメリットは、例えば、騒音が目的信号に対してそれほど大きくない(例えばS/N比が10dB程度)の信号に対し、雑音抑圧処理した後に音声区間度を算出すると、処理遅延が長くなってしまい、例えば通信に不適切、といった環境において発揮される。このような環境下では、受音信号を音声区間度算出に用いることで、(音声区間度算出精度は多少劣化するものの、その代わりに、)雑音抑圧処理と音声区間度算出を並列で行えるために処理遅延を短くすることが出来る。
原音付加率決定手段4においては、音声区間度算出手段3で算出した音声区間度に応じて原音付加率を動的に決定する。具体的な方法を二つ挙げる。まず、音声区間度算出手段3で各区間を音声区間か騒音区間かの2種類に類別した場合について述べる。
この場合は、音声区間であると判定された区間(Pow>TS)については、音声品質を重視する目的で原音付加率α(t)を高め(例えば0.3)に設定する。雑音区間であると判定された区間(Pow<TN)については、S/N比改善量を優先する目的で原音付加率α(t)を低め(例えば0.05)に設定する。
As shown in FIG. 2, when calculating the signal power Pow to calculate the speech interval, the original signal x (t) may be used instead of the noise-suppressed signal S1 ′ (t). The merit of the configuration of FIG. 2 is that, for example, if a speech interval is calculated after noise suppression processing is performed on a signal whose noise is not so large (for example, the S / N ratio is about 10 dB), the processing delay is increased. It becomes long, and is exhibited in an environment where it is inappropriate for communication, for example. In such an environment, using the received sound signal for calculating the speech interval degree allows noise suppression processing and speech interval degree calculation to be performed in parallel (although the speech interval degree calculation accuracy is somewhat degraded, instead) In addition, the processing delay can be shortened.
In the original sound addition rate determining means 4, the original sound addition rate is dynamically determined according to the voice interval degree calculated by the voice interval degree calculating means 3. Here are two specific methods. First, a description will be given of a case in which each section is classified into two types, that is, a voice section and a noise section by the voice section degree calculation means 3.
In this case, for the section determined to be a voice section (Pow> TS), the original sound addition rate α (t) is set high (for example, 0.3) for the purpose of placing importance on the voice quality. For the section determined to be a noise section (Pow <TN), the original sound addition rate α (t) is set low (for example, 0.05) for the purpose of giving priority to the S / N ratio improvement amount.

次に、音声区間度算出手段3で、ある値TSとTNを設定し(TS>TN)、音声区間らしさを(Pow−TN)/(TS−TN)として表した場合の原音付加率決定手段4の動作例を記載する。
Pow>TSを満たす場合には音声区間らしさが100%であるとして原音付加率の最大値αmax(例えばαmax=0.5)を付加率α(t)として設定する。また、Pow<TNを満たす場合には雑音区間らしさが100%であるとして原音付加率の最小値αmin(例えばαmin=0.1)を付加率α(t)として設定する。TN<Pow<TSの場合には、図3に示す方法で原音付加率α(t)を決定する。すなわち、横軸にPowを、縦軸に原音付加率α(t)をとったグラフ上において、(TN,αmin),(TS,αmax)の2点を通る直線を求めることにより、各Powに応じた原音付加率α(t)を算出する。以上の動作をプログラム言語を使って式(1)に示す。

Figure 2005257748
原音付加手段5においては、原音付加率決定手段4で決定された原音付加率α(t)の値を原信号x(t)に乗算し、その信号と雑音抑圧された信号S1′(t)を加算して出力信号y(t)として出力する。 Next, the voice interval degree calculation means 3 sets certain values TS and TN (TS> TN), and the original sound addition rate determination means when the likelihood of the voice interval is expressed as (Pow-TN) / (TS-TN). The operation example of 4 is described.
When Pow> TS is satisfied, the maximum value α max (for example, α max = 0.5) of the original sound addition rate is set as the addition rate α (t) on the assumption that the speech interval likelihood is 100%. When Pow <TN is satisfied, the noise section likelihood is 100%, and the minimum value α min (for example, α min = 0.1) of the original sound addition rate is set as the addition rate α (t). In the case of TN <Pow <TS, the original sound addition rate α (t) is determined by the method shown in FIG. That is, by obtaining a straight line passing through two points (TN, α min ) and (TS, α max ) on a graph with Pow on the horizontal axis and the original sound addition rate α (t) on the vertical axis, An original sound addition rate α (t) corresponding to Pow is calculated. The above operation is shown in Equation (1) using a programming language.
Figure 2005257748
The original sound addition means 5 multiplies the original signal x (t) by the value of the original sound addition ratio α (t) determined by the original sound addition ratio determination means 4 and the signal S1 ′ (t) whose noise is suppressed. Are added and output as an output signal y (t).

次にこの発明の第2実施形態を説明する。この発明の第2実施形態は請求項4で提案する収音方法と、請求項7で提案する収音装置に対応する。
ここでは音声区間度算出手段3の一例について説明する。実施形態1では、音声区間度算出に用いる信号は、既存の雑音抑圧手段2により雑音抑圧された信号S1′(t)、または原信号x(t)を用いた。信号S1′(t)を使う場合、例えば既存の雑音抑圧手段2の雑音抑圧処理がうまく働かない場合、音声区間度算出手段3の算出精度にも悪影響を及ぼす可能性があった。また、原信号x(t)を用いた場合、受音信号のS/N比が著しく悪い(例えばS/N比が0dBまたは負の値となる)場合には、音声区間度の算出が困難であった。
Next, a second embodiment of the present invention will be described. The second embodiment of the present invention corresponds to the sound collection method proposed in claim 4 and the sound collection device proposed in claim 7.
Here, an example of the speech segment degree calculation means 3 will be described. In the first embodiment, the signal S1 ′ (t) or the original signal x (t) that has been noise-suppressed by the existing noise suppression means 2 is used as the signal used for calculating the speech interval degree. When the signal S1 ′ (t) is used, for example, when the noise suppression process of the existing noise suppression unit 2 does not work well, the calculation accuracy of the speech interval degree calculation unit 3 may be adversely affected. Further, when the original signal x (t) is used, it is difficult to calculate the speech interval degree when the S / N ratio of the received signal is extremely bad (for example, the S / N ratio is 0 dB or a negative value). Met.

この実施形態2では、これら雑音抑圧手段2の影響や原信号x(t)のS/N比の影響を受けずに、なるべく精度の高い音声区間度を算出する方法の一例を図4に示す。音声区間度算出手段3の入力手段1A、1Bは例えば、2本以上のマイクロホンで構成される。帯域分割手段6では入力手段1A、1Bからの信号を周波数分析する。周波数分析には例えばフーリエ変換が用いられる。チャネル間レベル差算出手段7では各周波数成分におけるチャネル間(入力手段1A側と1B側をチャネル間と称す)のレベル差ΔA(ω)が算出される。ΔA(ω)は式(2)で定義される。
ΔA(ω)=20log10[|X(ω)|/|X(ω)|] (2)
音源信号判定手段8においてはチャネル間レベル差ΔA(ω)の値に基づき、各周波数成分が目的信号の成分か、雑音信号の成分かを判定する。例えば図4のように、目的音源3が入力手段1Bに比べて入力手段1Aのほうに近く配置され、逆に、雑音源Nが入力手段1Aに比べて入力手段1Bのほうに近く配置されている場合には、ΔA(ω)≦0を満たす周波数成分は雑音信号成分である、と判定される。
In the second embodiment, FIG. 4 shows an example of a method for calculating a speech segment degree with the highest possible accuracy without being influenced by the noise suppression means 2 and the S / N ratio of the original signal x (t). . The input means 1A and 1B of the speech interval degree calculating means 3 are composed of, for example, two or more microphones. The band dividing means 6 performs frequency analysis on the signals from the input means 1A and 1B. For example, Fourier transform is used for frequency analysis. The inter-channel level difference calculating means 7 calculates the level difference ΔA (ω) between channels (the input means 1A side and 1B side are referred to as inter-channel) in each frequency component. ΔA (ω) is defined by equation (2).
ΔA (ω) = 20 log 10 [| X 1 (ω) | / | X 2 (ω) |] (2)
The sound source signal determination means 8 determines whether each frequency component is a target signal component or a noise signal component based on the value of the inter-channel level difference ΔA (ω). For example, as shown in FIG. 4, the target sound source 3 is arranged closer to the input means 1A than the input means 1B, and conversely, the noise source N is arranged closer to the input means 1B than the input means 1A. If it is, the frequency component satisfying ΔA (ω) ≦ 0 is determined to be a noise signal component.

音源信号選択手段9においては、音源信号判定手段8の判定結果に基づき、目的信号成分にはあるゲイン値gが、雑音信号成分にはあるゲイン値gが乗算される。音源信号選択手段8における制御式をプログラム言語を用いて式(3)に示す。
if ΔA(ω)≧0 then S^(ω)=g・X(ω)
elseif ΔA(ω)≦0 then S^(ω)=g・X(ω) (3)
ゲイン値の例として例えば、g=1.0,g=0.0を与える。
In the sound source signal selection means 9, based on the determination result of the sound source signal determination means 8, the target signal component is multiplied by a certain gain value g S and the noise signal component is multiplied by a certain gain value g N. A control expression in the sound source signal selecting means 8 is shown in Expression (3) using a program language.
if ΔA (ω) ≧ 0 then S ^ 1 (ω) = g S · X 1 (ω)
elseif ΔA (ω) ≦ 0 then S ^ 1 (ω) = g N · X 1 (ω) (3)
For example, g S = 1.0 and g N = 0.0 are given as examples of gain values.

パワー積算手段10においては、音源信号選択手段9において重み付けされた信号S^(ω)のパワーを全周波数帯域に渡り積算する。積算したパワーPow値を原音付加率決定手段4へ送る。原音付加率決定手段4においては、この積算されたパワーPow値を用いて、実施形態1で述べたのと同様の制御で付加率を決定する。図4に示した音声区間度算出手段3は、原信号に対して、音源の方向情報を利用し、各周波数成分が音声信号、雑音信号どちらのものであるか判定し、音声スペクトルと判定された帯域はゲイン1.0を乗算することで強調し、雑音スペクトルと判定された帯域はゲイン値0を乗算することで抑圧するため、実質的には原信号x(t)のS/N比を改善したのと等価となり、S/N比を改善した後に音声区間度を算出することになる。このため、S/N比が悪い信号に対して、原信号をそのまま使う方法よりも音声区間度の算出精度が向上することができる。また、既存の雑音抑圧処理と独立して音声区間度を算出できるため、仮に既存の雑音抑圧処理の性能が悪い場合にもその影響を受けずに音声区間度を算出することができる。 The power integration means 10 integrates the power of the signal S 1 (ω) weighted by the sound source signal selection means 9 over the entire frequency band. The integrated power Pow value is sent to the original sound addition rate determination means 4. The original sound addition rate determination means 4 determines the addition rate by the same control as described in the first embodiment using the integrated power Pow value. The voice interval degree calculating means 3 shown in FIG. 4 uses the direction information of the sound source for the original signal, determines whether each frequency component is a voice signal or a noise signal, and is determined as a voice spectrum. The band determined as a noise spectrum is suppressed by multiplication by a gain value of 0, so that the S / N ratio of the original signal x (t) is substantially reduced. Thus, the speech interval degree is calculated after improving the S / N ratio. For this reason, it is possible to improve the accuracy of calculating the speech interval degree for a signal having a poor S / N ratio, compared to the method using the original signal as it is. In addition, since the speech interval degree can be calculated independently of the existing noise suppression processing, the speech interval degree can be calculated without being affected even when the performance of the existing noise suppression processing is poor.

次に、この発明の実施形態3を説明する。この実施形態3は請求項5で提案する収音方法と請求項8で提案する収音装置に対応する。
構成例を図5に示す。この実施形態3では実施形態1で述べた原音付加率決定手段4において、原音付加率α(t)を変化させる際、原音付加率α(t)の時間変化を滑らかにする原音付加率平滑化手段11を加えた構成を特徴とするものである。
原音付加率α(t)の時間変化を滑らかにするために、一つ前の時刻における原音付加率αpre(t)と現時刻の原音付加率α(t)の差分を算出し、その差分の極性(+,−)に応じて現時刻の原音付加率α(t)を決定する。決定法の一例をプログラム言語を用いて式(4)に示す。
if (αpre(t)<α(t)) then αsmooth(t)=αpre(t)+atk(α(t)−αpre(t))
else then α(t)smooth=αpre(t)+rls(α(t)−αpre(t))
(4)
ここで、atkとrlsは0<atk<1と0<rls<1を満たす値である。上記動作により、原音付加率α(t)は毎時刻に大幅に変化することは無く、一つ前の時刻からの微増、または微減にとどまるため時間変化が滑らかとなり、処理後の音質歪が小さくなる。微増、微減どちらの方向へ推移するかは、現時刻の原音付加率α(t)と前時刻の付加率の差分値により決定され、現時刻の原音付加率α(t)が前時刻に比べて増加している場合には微増の方向へ、減少している場合には微減の方向へ推移する。この第3実施形態の発明はもちろん、第2実施形態と合わせて使うことも出来る。
Next, a third embodiment of the present invention will be described. The third embodiment corresponds to the sound collection method proposed in claim 5 and the sound collection device proposed in claim 8.
A configuration example is shown in FIG. In the third embodiment, when the original sound addition rate α (t) is changed in the original sound addition rate determining means 4 described in the first embodiment, the original sound addition rate smoothing is performed to smooth the time change of the original sound addition rate α (t). The configuration is characterized by adding means 11.
In order to smooth the time change of the original sound addition rate α (t), the difference between the original sound addition rate α pre (t) at the previous time and the original sound addition rate α (t) at the current time is calculated, and the difference The original sound addition rate α (t) at the current time is determined according to the polarity (+, −). An example of the determination method is shown in Formula (4) using a programming language.
if (α pre (t) <α (t)) then α smooth (t) = α pre (t) + atk (α (t) −α pre (t))
else then α (t) smooth = α pre (t) + rls (α (t) −α pre (t))
(4)
Here, atk and rls are values satisfying 0 <atk <1 and 0 <rls <1. With the above operation, the original sound addition rate α (t) does not change significantly every time, and the time change becomes smooth because it only slightly increases or decreases from the previous time, and the sound quality distortion after processing is small. Become. The direction of the slight increase or slight decrease is determined by the difference between the original sound addition rate α (t) at the current time and the addition rate at the previous time, and the original sound addition rate α (t) at the current time is compared with the previous time. If it is increasing, it will increase slightly, and if it is decreasing, it will decrease slightly. The invention of the third embodiment can be used together with the second embodiment.

このように、上記手段を用いることで、雑音が混ざった信号から音声区間らしさを表す物理量を算出し、その値を元に音声区間らしさに応じて原音付加率を自動で調整することが可能となり、従来の全区間一律に原音付加率を定める方法に比べて、音声区間においては歪低減が実現し、雑音区間においてはS/N比改善量が確保され、全区間トータルで聞きやすく、なおかつ残留雑音の少ない信号を抽出し、収音することができる。
上述したこの発明の収音方法はこの発明による収音プログラムをコンピュータに実行させることによって実現される。この発明による収音プログラムはコンピュータが解読可能なプログラム言語によって記述され,磁気ディスク或はCD−ROM等の記録媒体に記録され、これらの記録媒体からコンピュータインストールされるか、又は通信回線を通じてインストールされる。インストールされた収音プログラムはコンピュータに備えられた中央演算処理装置CPUに解読されて実行される。
In this way, by using the above means, it is possible to calculate a physical quantity representing the likelihood of a speech section from a signal mixed with noise and automatically adjust the original sound addition rate according to the likelihood of the speech section based on the value. Compared with the conventional method of uniformly determining the original sound addition rate for all sections, distortion reduction is realized in the voice section, and the S / N ratio improvement amount is secured in the noise section, making it easy to hear in all sections, and remaining. A signal with less noise can be extracted and collected.
The sound collecting method of the present invention described above is realized by causing a computer to execute the sound collecting program according to the present invention. The sound collection program according to the present invention is written in a computer-readable program language, recorded on a recording medium such as a magnetic disk or a CD-ROM, and installed on the computer from the recording medium or installed through a communication line. The The installed sound collecting program is decoded and executed by a central processing unit CPU provided in the computer.

この発明による収音方法及び収音装置は例えば音声認識装置の収音装置に適用することができ、収音する音声信号の品質を改善することにより、音声認識装置の認識率を向上させることができる。   The sound collection method and sound collection device according to the present invention can be applied to, for example, a sound collection device of a voice recognition device, and can improve the recognition rate of the voice recognition device by improving the quality of a voice signal to be collected. it can.

この発明の第1実施形態を説明するためのブロック図。The block diagram for demonstrating 1st Embodiment of this invention. 図1に示した第1実施形態の変形例を説明するためのブロック図。The block diagram for demonstrating the modification of 1st Embodiment shown in FIG. 図1に示した第1実施形態の動作を説明するためのグラフ。The graph for demonstrating the operation | movement of 1st Embodiment shown in FIG. この発明の第2実施形態を説明するためのブロック図。The block diagram for demonstrating 2nd Embodiment of this invention. この発明の第3実施形態を説明するためのブロック図。The block diagram for demonstrating 3rd Embodiment of this invention.

符号の説明Explanation of symbols

1、1A、1B 音声入力手段 7 チャネル間レベル差算出手段
2 雑音抑圧手段 8 音源信号判定手段
3 音声区間度算出手段 9 音源信号選択手段
4 原音付加率決定手段 10 パワー積算手段
5 原音付加手段 11 原音付加率平滑化手段
6 帯域分割手段 100 収音装置
1, 1A, 1B Voice input means 7 Channel level difference calculating means
2 Noise suppression means 8 Sound source signal judgment means
3 Voice interval degree calculation means 9 Sound source signal selection means
4 Original sound addition rate determining means 10 Power integrating means
5 Original sound addition means 11 Original sound addition rate smoothing means
6 Band dividing means 100 Sound collecting device

Claims (9)

入力信号に含まれる雑音を抑圧し、目的信号を強調する雑音抑圧処理と、
入力信号の各区間毎における音声区間らしさを算出する音声区間度算出処理と、
前記音声区間度算出処理で算出された音声区間らしさに基づき前記雑音抑圧処理された目的信号に原音を付加する率を決定する原音付加率決定処理と、
前記原音付加率決定処理で決定した原音付加率に従って前記雑音抑圧処理された目的信号に原音を付加する原音付加処理と、
を含むことを特徴とする収音方法。
Noise suppression processing that suppresses noise contained in the input signal and emphasizes the target signal;
A voice segment degree calculation process for calculating the likelihood of a voice segment in each segment of the input signal;
An original sound addition rate determination process for determining a rate of adding the original sound to the target signal subjected to the noise suppression process based on the likelihood of the voice interval calculated in the voice interval degree calculation process;
Original sound addition processing for adding the original sound to the target signal subjected to the noise suppression processing according to the original sound addition rate determined in the original sound addition rate determination processing;
A sound collection method comprising:
請求項1記載の収音方法において、前記音声区間度算出処理は前記入力信号のパワーが所定値以上の区間を音声区間と決定し、所定値以下を雑音区間と決定することを特徴とする収音方法。   2. The sound collection method according to claim 1, wherein the voice interval degree calculation processing determines a section where the power of the input signal is a predetermined value or more as a voice section, and determines a section below the predetermined value as a noise section. Sound method. 請求項1記載の収音方法において、前記音声区間度算出処理は前記入力信号のパワーが第1設定値TS以上の区間を音声区間と決定し、前記入力信号のパワーが前記第1設定値より小さい第2設定値TN以下の区間を雑音区間と決定し、前記入力信号のパワーが前期第1設定値TSと第2設定値TNとの間にある場合には入力信号のパワーに応じて音声らしさを決定することを特徴とする収音方法。   The sound collection method according to claim 1, wherein the voice segment degree calculation processing determines a section in which the power of the input signal is equal to or greater than a first set value TS as a voice section, and the power of the input signal is greater than the first set value. A section having a small second set value TN or less is determined as a noise section, and when the power of the input signal is between the first set value TS and the second set value TN in the previous period, the sound is determined according to the power of the input signal. A sound collection method characterized by determining the likelihood. 請求項1記載の収音方法において、前記音声区間度算出処理は、目的音源及び雑音源までの距離が互に異なる距離となる位置に設置された少なくとも一対の入力手段を具備し、これら一対の入力手段で捉えた雑音信号を含む目的信号をそれぞれチャネル別に帯域分割処理し、帯域分割された各チャネルの帯域成分を同一帯域成分毎にチャネル間でレベル比較し、このレベル比較結果に従って、前記入力手段と前記目的音源及び雑音源との距離差に対応付けして音声信号成分か雑音信号成分かを判定し、音声信号成分と判定された周波数帯域成分のパワーを積算し、このパワーの積算値により音声らしさを決定することを特徴とする収音方法。   The sound collection method according to claim 1, wherein the speech segment degree calculation processing includes at least a pair of input means installed at positions where the distances to the target sound source and the noise source are different from each other. The target signal including the noise signal captured by the input means is band-divided for each channel, the band components of each band-divided channel are compared between channels for each same band component, and the input according to the level comparison result The voice signal component or the noise signal component is determined in association with the distance difference between the means and the target sound source and the noise source, the power of the frequency band component determined as the voice signal component is integrated, and the integrated value of this power A sound collection method, wherein the sound quality is determined by the method. 請求項1乃至4記載の収音方法の何れかにおいて、前記原音付加率決定処理で決定した現時刻の原音付加率と前時刻に決定した原音付加率との差を算出し、差の値と極性に応じて目標とする現時刻の原音付加率に向って序々に原音付加率を変化させることを特徴とする収音方法。   5. The sound collection method according to claim 1, wherein a difference between an original sound addition rate at the current time determined by the original sound addition rate determination process and an original sound addition rate determined at a previous time is calculated, A sound collection method characterized by gradually changing an original sound addition rate toward a target original sound addition rate at a current time according to polarity. 音声入力手段の出力信号に対して雑音を抑圧し目的信号を強調する機能を有する雑音抑圧手段と、
前記音声信号入力手段または前記雑音抑圧手段の出力信号のある時間における音声区間らしさを算出する音声区間度算出手段と、
前記音声区間度算出手段で算出された音声区間らしさに基づき原音付加率α(t)(0<α(t)<1)を決定する原音付加率決定手段と、
前記音声入力手段の出力信号を前記原音付加率決定手段で決定した原音付加率α(t)倍(0<α(t)<1)して前記雑音抑圧手段の出力信号に加算する原音付加手段と、
を有することを特徴とする収音装置。
Noise suppression means having a function of suppressing noise with respect to the output signal of the voice input means and emphasizing a target signal;
A voice interval degree calculating means for calculating the likelihood of a voice interval at a certain time of the output signal of the voice signal input means or the noise suppression means;
Original sound addition rate determining means for determining the original sound addition rate α (t) (0 <α (t) <1) based on the likelihood of the voice interval calculated by the voice interval degree calculating unit;
Original sound adding means for adding the output signal of the voice input means to the output signal of the noise suppressing means after multiplying the original sound addition ratio α (t) determined by the original sound addition ratio determining means (0 <α (t) <1) When,
A sound collecting device comprising:
請求項6に記載した収音装置において、
前記音声入力手段として互に離して設けられた複数のマイクロホンを用い、前記音声区間度算出手段として、前記複数のマイクロホンが出力する複数のチャネル信号を帯域分割する帯域分割手段と、前記帯域分割手段で分割された各チャネル信号の各同一帯域ごとに、チャネル間の同一帯域成分毎のレベル差を算出する帯域別チャネル間レベル差算出手段と、
前記各帯域成分毎の帯域別チャネル間レベル差に基づき、その帯域の上記帯域分割された各チャネル信号のいずれかがいずれの音源から入力された信号であるかを判定する音源信号判定手段と、
前記音源信号判定手段の判定に基づき、上記帯域分割された各チャネル信号から、目的音源から入力された信号を選択する音源信号選択手段と、
前記音源信号選択手段の出力信号のパワーを積算するパワー積算手段と、
前記パワー積算手段で積算したパワー値により原音付加率を決定する原音付加率決定手段と、
前記原音付加率決定手段が決定した原音付加率に従って前記雑音抑圧手段が出力する目的信号に前記音声入力手段が出力する原音信号を付加する原音付加手段と、
を有することを特徴とする収音装置。
The sound collecting device according to claim 6,
A plurality of microphones provided apart from each other as the sound input means, and a band dividing means for dividing a plurality of channel signals output from the plurality of microphones as the sound interval degree calculating means, and the band dividing means For each same band of each channel signal divided in step 1, a level difference calculation unit between channels for each band to calculate a level difference for each same band component between channels;
Sound source signal determination means for determining which one of the above-mentioned band-divided channel signals of each band is a signal input from which sound source based on the level difference between channels for each band component;
Based on the determination of the sound source signal determination means, sound source signal selection means for selecting a signal input from the target sound source from each of the band-divided channel signals,
Power integrating means for integrating the power of the output signal of the sound source signal selecting means;
Original sound addition rate determining means for determining the original sound addition rate based on the power value integrated by the power integration means;
Original sound addition means for adding the original sound signal output by the voice input means to the target signal output by the noise suppression means according to the original sound addition ratio determined by the original sound addition ratio determination means;
A sound collecting device comprising:
請求項6又は7記載の収音装置の何れかにおいて、前記原音付加率決定手段が決定した現時刻の原音付加率と、前時刻に決定した原音付加率との差を求め、その差の値と極性に応じて目標となる現時刻の原音付加率に向って徐々に原音付加率を変化させる原音付加率平滑化手段を設けたことを特徴とする収音装置。   8. The sound collecting device according to claim 6, wherein a difference between the original sound addition rate at the current time determined by the original sound addition rate determination means and the original sound addition rate determined at the previous time is obtained, and a value of the difference is obtained. And an original sound addition rate smoothing means for gradually changing the original sound addition rate toward the target original sound addition rate at the current time according to the polarity. コンピュータが解読可能なプログラム言語によって記述され、コンピュータに前記請求項1乃至5記載の収音方法を実行させる収音プログラム。   6. A sound collection program that is written in a computer-readable program language and causes the computer to execute the sound collection method according to claim 1.
JP2004065513A 2004-03-09 2004-03-09 Sound collection method, sound collection device, and sound collection program Expired - Fee Related JP4518817B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2004065513A JP4518817B2 (en) 2004-03-09 2004-03-09 Sound collection method, sound collection device, and sound collection program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2004065513A JP4518817B2 (en) 2004-03-09 2004-03-09 Sound collection method, sound collection device, and sound collection program

Publications (2)

Publication Number Publication Date
JP2005257748A true JP2005257748A (en) 2005-09-22
JP4518817B2 JP4518817B2 (en) 2010-08-04

Family

ID=35083568

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2004065513A Expired - Fee Related JP4518817B2 (en) 2004-03-09 2004-03-09 Sound collection method, sound collection device, and sound collection program

Country Status (1)

Country Link
JP (1) JP4518817B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010102201A (en) * 2008-10-24 2010-05-06 Yamaha Corp Noise suppressing device and noise suppressing method
JP2011048302A (en) * 2009-08-28 2011-03-10 Fujitsu Ltd Noise reduction device and noise reduction program
JPWO2014192604A1 (en) * 2013-05-31 2017-02-23 ソニー株式会社 Encoding apparatus and method, decoding apparatus and method, and program
JP2019204074A (en) * 2018-05-21 2019-11-28 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Speech dialogue method, apparatus and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58181099A (en) * 1982-04-16 1983-10-22 三菱電機株式会社 Voice identifier
JPS5999497A (en) * 1982-11-29 1984-06-08 松下電器産業株式会社 Voice recognition equipment
JPS6267598A (en) * 1985-09-20 1987-03-27 株式会社リコー Voice section detection system
JPH10161694A (en) * 1996-11-28 1998-06-19 Nippon Telegr & Teleph Corp <Ntt> Band split type noise reducing method
WO1999030315A1 (en) * 1997-12-08 1999-06-17 Mitsubishi Denki Kabushiki Kaisha Sound signal processing method and sound signal processing device
JP2000082999A (en) * 1998-09-07 2000-03-21 Nippon Telegr & Teleph Corp <Ntt> Noise reduction processing method/device and program storage medium
JP2002073061A (en) * 2000-09-05 2002-03-12 Matsushita Electric Ind Co Ltd Voice recognition device and its method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58181099A (en) * 1982-04-16 1983-10-22 三菱電機株式会社 Voice identifier
JPS5999497A (en) * 1982-11-29 1984-06-08 松下電器産業株式会社 Voice recognition equipment
JPS6267598A (en) * 1985-09-20 1987-03-27 株式会社リコー Voice section detection system
JPH10161694A (en) * 1996-11-28 1998-06-19 Nippon Telegr & Teleph Corp <Ntt> Band split type noise reducing method
WO1999030315A1 (en) * 1997-12-08 1999-06-17 Mitsubishi Denki Kabushiki Kaisha Sound signal processing method and sound signal processing device
JP2000082999A (en) * 1998-09-07 2000-03-21 Nippon Telegr & Teleph Corp <Ntt> Noise reduction processing method/device and program storage medium
JP2002073061A (en) * 2000-09-05 2002-03-12 Matsushita Electric Ind Co Ltd Voice recognition device and its method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010102201A (en) * 2008-10-24 2010-05-06 Yamaha Corp Noise suppressing device and noise suppressing method
JP2011048302A (en) * 2009-08-28 2011-03-10 Fujitsu Ltd Noise reduction device and noise reduction program
JPWO2014192604A1 (en) * 2013-05-31 2017-02-23 ソニー株式会社 Encoding apparatus and method, decoding apparatus and method, and program
JP2019204074A (en) * 2018-05-21 2019-11-28 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Speech dialogue method, apparatus and system

Also Published As

Publication number Publication date
JP4518817B2 (en) 2010-08-04

Similar Documents

Publication Publication Date Title
US9881635B2 (en) Method and system for scaling ducking of speech-relevant channels in multi-channel audio
KR101461141B1 (en) System and method for adaptively controlling a noise suppressor
KR101227876B1 (en) Method and apparatus for maintaining speech audibility in multi-channel audio with minimal impact on surround experience
US10482896B2 (en) Multi-band noise reduction system and methodology for digital audio signals
JP6134078B1 (en) Noise suppression
US10242692B2 (en) Audio coherence enhancement by controlling time variant weighting factors for decorrelated signals
JP2013525843A (en) Method for optimizing both noise reduction and speech quality in a system with single or multiple microphones
Hendriks et al. Optimal near-end speech intelligibility improvement incorporating additive noise and late reverberation under an approximation of the short-time SII
CN112272848A (en) Background noise estimation using gap confidence
Marin-Hurtado et al. Perceptually inspired noise-reduction method for binaural hearing aids
JP2000330597A (en) Noise suppressing device
JP2006243644A (en) Method for reducing noise, device, program, and recording medium
JP2009296298A (en) Sound signal processing device and method
EP2230664B1 (en) Method and apparatus for attenuating noise in an input signal
JP4518817B2 (en) Sound collection method, sound collection device, and sound collection program
JP2005258158A (en) Noise removing device
CN110168640A (en) For enhancing the device and method for needing component in signal
KR101096091B1 (en) Apparatus for Separating Voice and Method for Separating Voice of Single Channel Using the Same
JP2021022872A (en) Sound collection device, sound collection program, and sound collection method
US20230138240A1 (en) Compensating Noise Removal Artifacts
US20230360662A1 (en) Method and device for processing a binaural recording
JP7264594B2 (en) Reverberation suppression device and hearing aid
JP2010028663A (en) Voice level adjusting device, voice level adjustment method, and program
JP2006313954A (en) Automatic sound volume control method, automatic sound volume control apparatus, program, and recording medium
Vashkevich et al. Speech enhancement in a smartphone-based hearing aid

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060411

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20060411

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090626

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20090804

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20090917

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20091222

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20100316

A911 Transfer to examiner for re-examination before appeal (zenchi)

Free format text: JAPANESE INTERMEDIATE CODE: A911

Effective date: 20100329

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20100506

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20100518

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130528

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140528

Year of fee payment: 4

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees