JP2005257748A

JP2005257748A - Sound pickup method, sound pickup system, and sound pickup program

Info

Publication number: JP2005257748A
Application number: JP2004065513A
Authority: JP
Inventors: Mariko Aoki; 真理子青木; Suehiro Shimauchi; 末廣島内; Kenichi Furuya; 賢一古家; Akitoshi Kataoka; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-03-09
Filing date: 2004-03-09
Publication date: 2005-09-22
Anticipated expiration: 2024-03-09
Also published as: JP4518817B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve the S/N ratio of a voice signal superposed with noise. <P>SOLUTION: A sound pickup method includes a noise-suppressing processing for suppressing the noise included in an input signal and intensifying a target signal; a voice section degree calculating processing for calculating the voice section likeness in each section of the input signal; an original sound addition rate determining processing for determining the rate for adding an original sound to the target signal suppressing the noise, on the basis of the voice section likeness calculated by the voice section degree calculating processing; and an original sound addition processing for adding the original sound to the target signal suppressing the noise, in accordance with an original sound addition rate determined by the original sound addition rate determining processing. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、雑音抑圧処理した信号に対して、各時間Δｔにおける音声区間らしさを算出し、その音声区間らしさに応じて原音付加率を自動的に設定することで、音声品質を保持しながら雑音抑圧を行う収音方法及び収音装置、収音プログラムに関する。 The present invention calculates the likelihood of a speech segment at each time Δt for a signal subjected to noise suppression processing, and automatically sets the original sound addition rate according to the likelihood of the speech segment, thereby maintaining noise while maintaining speech quality. The present invention relates to a sound collection method, a sound collection device, and a sound collection program for performing suppression.

雑音抑圧処理した信号に対し、原音を付加することで処理音の歪を低減させ、聴感上の品質を上げる技術は従来から提案されている（非特許文献１）。
この非特許文献１に開示された原音付加方法は原音付加率を全区間（時間軸方向の全区間）にわたって一定に保つ処理がなされている。
佐々木潤子、羽田陽一、“マスキング効果を考慮した低歪み一入力系雑音低減方法の検討，”音講論，2−3−11，pp．525−526，（1997．3） A technique has been conventionally proposed in which distortion of a processed sound is reduced by adding an original sound to a signal subjected to noise suppression processing, thereby improving the quality of hearing (Non-Patent Document 1).
In the original sound addition method disclosed in Non-Patent Document 1, a process of keeping the original sound addition rate constant over the entire section (all sections in the time axis direction) is performed.
Junko Sasaki, Yoichi Haneda, “Examination of low distortion single-input noise reduction method considering masking effect,” sound theory, 2-3-11, pp. 525-526, (1997.3)

ところが、例えば雑音が極端に大きな環境（目的音声よりも雑音の方が大きい環境）においては、音声品質を改善するのに十分な原音付加率を常に一定の値で付加すると、雑音も大きく残留してしまい、品質を保ちながらＳ／Ｎ比を改善させることが難しい、という課題があった。
この発明の目的は音声区間の品質を保ちながら時間軸方向の全区間にわたってＳ／Ｎ比を改善することができる収音方法、収音装置、収音プログラムを提案しようとするものである。 However, for example, in an environment where the noise is extremely large (an environment where the noise is larger than the target speech), if the original sound addition rate sufficient to improve speech quality is always added at a constant value, the noise will remain large. Therefore, there is a problem that it is difficult to improve the S / N ratio while maintaining the quality.
An object of the present invention is to propose a sound collection method, a sound collection device, and a sound collection program that can improve the S / N ratio over the entire interval in the time axis direction while maintaining the quality of the voice interval.

本発明は、従来、全区間で一定の原音付加率を用いていた方法に対し、雑音抑圧処理後の信号に対して音声区間らしさを算出し、音声（らしい）区間においては音声品質をなるべく保ちながら、全区間全体でのＳ／Ｎ比改善量を従来に比べて改善させようとするものである。
その具体的な手法として、この発明の第１実施形態として入力信号に含まれる雑音を抑圧し、目的信号を強調する雑音抑圧処理と、入力信号の各区間毎における音声区間らしさを算出する音声区間度算出処理と、音声区間度算出処理で算出された音声区間らしさに基づき雑音抑圧処理された目的信号に原音を付加する率を決定する原音付加率決定処理と、原音付加率決定処理で決定した原音付加率に従って雑音抑圧処理された目的信号に原音を付加する原音付加処理とを含む収音方法を提案する。 The present invention calculates the likelihood of a speech section for a signal after noise suppression processing, compared to the conventional method using a constant original sound addition rate in all sections, and keeps the speech quality as much as possible in the speech (likely) section. However, it is intended to improve the S / N ratio improvement amount in the entire section as compared with the conventional art.
Specifically, as a first embodiment of the present invention, noise suppression processing for suppressing noise included in an input signal and emphasizing a target signal, and a speech section for calculating the likelihood of a speech section for each section of the input signal are described. The original sound addition rate determination processing for determining the rate of adding the original sound to the target signal subjected to the noise suppression processing based on the sound interval characteristic calculated by the voice interval degree calculation processing, and the original sound addition rate determination processing A sound collection method including original sound addition processing for adding original sound to a target signal subjected to noise suppression processing according to the original sound addition rate is proposed.

この第１実施形態において、音声区間度算出処理は入力信号のパワーが所定値以上の区間を音声区間と決定し、所定値以下を雑音区間と決定する収音方法を提案する。
更に、第１実施形態において、音声区間度算出処理は入力信号のパワーが第１設定値ＴＳ以上の区間を音声区間と決定し、入力信号のパワーが前記第１設定値より小さい第２設定値ＴＮ以下の区間を雑音区間と決定し、入力信号のパワーが前期第１設定値ＴＳと第２設定値ＴＮとの間にある場合には入力信号のパワーに応じて音声らしさを決定する収音方法を提案する。 In this first embodiment, the speech interval degree calculation process proposes a sound collection method in which a section in which the power of the input signal is greater than or equal to a predetermined value is determined as a speech section and a section below the predetermined value is determined as a noise section.
Furthermore, in the first embodiment, the voice interval degree calculation process determines a section in which the power of the input signal is equal to or higher than the first set value TS as a voice section, and the second set value in which the power of the input signal is smaller than the first set value. A section below TN is determined as a noise section, and when the power of the input signal is between the first set value TS and the second set value TN in the previous period, the sound collection is determined according to the power of the input signal. Suggest a method.

更に、この発明の第２実施形態では音声区間度算出処理は、目的音源及び雑音源までの距離が互に異なる距離となる位置に設置された少なくとも一対の音声入力手段を具備し、これら一対の音声入力手段で捉えた雑音信号を含む目的信号をそれぞれチャネル別に帯域分割処理し、帯域分割された各チャネルの帯域成分を同一帯域成分毎にチャネル間でレベル比較し、このレベル比較結果に従って、入力手段と目的音源及び雑音源との距離差に対応付けして音声信号成分か雑音信号成分かを判定し、音声信号成分と判定された周波数帯域成分のパワーを積算し、このパワーの積算値により音声らしさを決定する収音方法を提案する。
更に、この発明の第３実施形態では原音付加率が変化する再に、その変化を序々に変化させる平滑化処理を付加し、これにより原音付加率が急変することを阻止し、音声に歪みを与えることのない収音方法を提案する。 Furthermore, in the second embodiment of the present invention, the speech interval degree calculation processing includes at least a pair of speech input means installed at positions where the distances to the target sound source and the noise source are different from each other. The target signal including the noise signal captured by the voice input means is band-divided for each channel, the band components of each band-divided channel are compared between channels for each same band component, and input according to this level comparison result The voice signal component or the noise signal component is determined in association with the distance difference between the sound source and the target sound source and the noise source, and the power of the frequency band component determined as the voice signal component is integrated. We propose a sound collection method that determines the sound quality.
Furthermore, in the third embodiment of the present invention, a smoothing process for gradually changing the change is added to the change of the original sound addition rate, thereby preventing the original sound addition rate from changing suddenly and distorting the sound. We propose a sound collection method that never gives.

本発明は目的信号と雑音信号が混ざって収音された受音信号に対し、音声区間らしさを表す物理量を算出し、その音声区間らしさに応じて原音付加率を自動的に調整することで、音声区間では歪が少ない音声を、雑音区間ではＳ／Ｎ比改善量が大きい（残留雑音が少ない）信号を出力できる。その結果、従来の全区間一律に原音付加率を定める方法に比べて聞きやすく、雑音の少ない音声信号を出力することができる。 The present invention calculates a physical quantity representing the likelihood of a voice section for a received signal collected by mixing a target signal and a noise signal, and automatically adjusts the original sound addition rate according to the likelihood of the voice section, It is possible to output a voice with less distortion in the voice section and a signal having a large S / N ratio improvement amount (less residual noise) in the noise section. As a result, it is possible to output a voice signal that is easier to hear and has less noise than the conventional method of uniformly determining the addition rate of the original sound in all sections.

以下にこの発明を実施するための最良の形態となる各実施形態について詳細に説明する。
図１はこの発明の第１実施形態を示す。この第１実施形態はこの発明の請求項１で提案する収音方法を利用して動作する収音装置であり、これは請求項６で提案する収音装置に対応する。
この発明の第１実施形態で提案する収音装置１００はマイクロホンで構成される音声入力手段１と、雑音抑圧手段２と、音声区間度算出手段３と、原音付加率決定手段４と、原音付加手段５によって構成される。 Each embodiment which is the best mode for carrying out the present invention will be described in detail below.
FIG. 1 shows a first embodiment of the present invention. The first embodiment is a sound collecting device that operates using the sound collecting method proposed in claim 1 of the present invention, and this corresponds to the sound collecting device proposed in claim 6.
The sound collection device 100 proposed in the first embodiment of the present invention includes a voice input means 1 composed of a microphone, a noise suppression means 2, a voice segment degree calculation means 3, an original sound addition rate determination means 4, and an original sound addition. Consists of means 5.

音声入力手段１は目的音源Ｓと雑音源Ｎとから目的信号Ｓ１（ｔ）と雑音信号Ｓ２（ｔ）とを受音する。尚、ここでは説明を簡略化するために雑音源Ｎを一つとして説明するが、一般に雑音源Ｎの個数は複数あってもよい。音声入力手段１の出力には目的信号に雑音信号が重畳した信号ｘ（ｔ）（以下原音信号と称す）が出力される。
音声入力手段１が出力する原音信号ｘ（ｔ）は雑音抑圧手段２に入力される。雑音抑圧手段２は一般的な手法、例えばスペクトルサブトラクション等を利用して雑音を抑圧する。雑音抑圧処理された信号をＳ１′（ｔ）として示す。
雑音抑圧処理された信号Ｓ１′（ｔ）は音声区間度算出手段３に供給され、この音声区間度算出手段３で音声区間らしさを表わす「音声区間度」を算出する。 The voice input means 1 receives the target signal S1 (t) and the noise signal S2 (t) from the target sound source S and the noise source N. Here, in order to simplify the description, the description will be made with one noise source N. However, in general, there may be a plurality of noise sources N. A signal x (t) (hereinafter referred to as an original sound signal) in which a noise signal is superimposed on a target signal is output from the audio input means 1.
The original sound signal x (t) output from the voice input unit 1 is input to the noise suppression unit 2. The noise suppression means 2 suppresses noise using a general method, for example, spectral subtraction. The signal subjected to noise suppression processing is shown as S1 ′ (t).
The noise-suppressed signal S1 ′ (t) is supplied to the speech interval degree calculating means 3, and the speech interval degree calculating means 3 calculates the “speech interval degree” representing the likelihood of the speech interval.

音声区間らしさの算出方法としてこの発明の請求項２で提案する収音方法では雑音抑圧された信号Ｓ１′（ｔ）のパワー（Ｐｏｗ）を算出し、パワーがある値ＴＳを超えたら音声区間らしい、と判定して音声区間であると決定する。逆に、パワーがＴＳ以下ならば騒音区間であると決定する。
この発明の請求項３で提案する収音方法では、例えば、ある値ＴＳとＴＮを設定し（ＴＳ＞ＴＮ）、Ｐｏｗ＞ＴＳを満たす場合には音声区間らしさが１００％であると判定する。ＴＮ＜Ｐｏｗ＜ＴＳの場合には、Ｐｏｗの値がＴＳに近いほど音声区間らしさも高い、と判定する。例えば（Ｐｏｗ−ＴＮ）／（ＴＳ−ＴＮ）を音声区間らしさを表す量（音声区間度）とする。 In the sound collection method proposed in claim 2 of the present invention as a method for calculating the likelihood of a speech section, the power (Pow) of the noise-suppressed signal S1 ′ (t) is calculated, and if the power exceeds a certain value TS, it seems to be a speech section. , And it is determined that it is a voice section. Conversely, if the power is equal to or less than TS, it is determined to be a noise section.
In the sound collecting method proposed in claim 3 of the present invention, for example, certain values TS and TN are set (TS> TN), and when Pow> TS is satisfied, it is determined that the likelihood of a speech section is 100%. In the case of TN <Pow <TS, it is determined that the closer the value of Pow is to TS, the higher the likelihood of a speech segment is. For example, let (Pow-TN) / (TS-TN) be an amount (sound interval degree) that represents the likelihood of a speech interval.

図２に示したように、音声区間度を算出するために信号のパワーＰｏｗを算出する際、雑音抑圧された信号Ｓ１′（ｔ）のかわりに原信号ｘ（ｔ）を用いても良い。図２の構成のメリットは、例えば、騒音が目的信号に対してそれほど大きくない（例えばＳ／Ｎ比が１０ｄＢ程度）の信号に対し、雑音抑圧処理した後に音声区間度を算出すると、処理遅延が長くなってしまい、例えば通信に不適切、といった環境において発揮される。このような環境下では、受音信号を音声区間度算出に用いることで、（音声区間度算出精度は多少劣化するものの、その代わりに、）雑音抑圧処理と音声区間度算出を並列で行えるために処理遅延を短くすることが出来る。
原音付加率決定手段４においては、音声区間度算出手段３で算出した音声区間度に応じて原音付加率を動的に決定する。具体的な方法を二つ挙げる。まず、音声区間度算出手段３で各区間を音声区間か騒音区間かの２種類に類別した場合について述べる。
この場合は、音声区間であると判定された区間（Ｐｏｗ＞ＴＳ）については、音声品質を重視する目的で原音付加率α（ｔ）を高め（例えば０．３）に設定する。雑音区間であると判定された区間（Ｐｏｗ＜ＴＮ）については、Ｓ／Ｎ比改善量を優先する目的で原音付加率α（ｔ）を低め（例えば０．０５）に設定する。 As shown in FIG. 2, when calculating the signal power Pow to calculate the speech interval, the original signal x (t) may be used instead of the noise-suppressed signal S1 ′ (t). The merit of the configuration of FIG. 2 is that, for example, if a speech interval is calculated after noise suppression processing is performed on a signal whose noise is not so large (for example, the S / N ratio is about 10 dB), the processing delay is increased. It becomes long, and is exhibited in an environment where it is inappropriate for communication, for example. In such an environment, using the received sound signal for calculating the speech interval degree allows noise suppression processing and speech interval degree calculation to be performed in parallel (although the speech interval degree calculation accuracy is somewhat degraded, instead) In addition, the processing delay can be shortened.
In the original sound addition rate determining means 4, the original sound addition rate is dynamically determined according to the voice interval degree calculated by the voice interval degree calculating means 3. Here are two specific methods. First, a description will be given of a case in which each section is classified into two types, that is, a voice section and a noise section by the voice section degree calculation means 3.
In this case, for the section determined to be a voice section (Pow> TS), the original sound addition rate α (t) is set high (for example, 0.3) for the purpose of placing importance on the voice quality. For the section determined to be a noise section (Pow <TN), the original sound addition rate α (t) is set low (for example, 0.05) for the purpose of giving priority to the S / N ratio improvement amount.

次に、音声区間度算出手段３で、ある値ＴＳとＴＮを設定し（ＴＳ＞ＴＮ）、音声区間らしさを（Ｐｏｗ−ＴＮ）／（ＴＳ−ＴＮ）として表した場合の原音付加率決定手段４の動作例を記載する。
Ｐｏｗ＞ＴＳを満たす場合には音声区間らしさが１００％であるとして原音付加率の最大値α_ｍａｘ（例えばα_ｍａｘ＝０．５）を付加率α（ｔ）として設定する。また、Ｐｏｗ＜ＴＮを満たす場合には雑音区間らしさが１００％であるとして原音付加率の最小値α_ｍｉｎ（例えばα_ｍｉｎ＝０．１）を付加率α（ｔ）として設定する。ＴＮ＜Ｐｏｗ＜ＴＳの場合には、図３に示す方法で原音付加率α（ｔ）を決定する。すなわち、横軸にＰｏｗを、縦軸に原音付加率α（ｔ）をとったグラフ上において、（ＴＮ，α_ｍｉｎ），（ＴＳ，α_ｍａｘ）の２点を通る直線を求めることにより、各Ｐｏｗに応じた原音付加率α（ｔ）を算出する。以上の動作をプログラム言語を使って式（１）に示す。

原音付加手段５においては、原音付加率決定手段４で決定された原音付加率α（ｔ）の値を原信号ｘ（ｔ）に乗算し、その信号と雑音抑圧された信号Ｓ１′（ｔ）を加算して出力信号ｙ（ｔ）として出力する。 Next, the voice interval degree calculation means 3 sets certain values TS and TN (TS> TN), and the original sound addition rate determination means when the likelihood of the voice interval is expressed as (Pow-TN) / (TS-TN). The operation example of 4 is described.
When Pow> TS is satisfied, the maximum value α _max (for example, α _max = 0.5) of the original sound addition rate is set as the addition rate α (t) on the assumption that the speech interval likelihood is 100%. When Pow <TN is satisfied, the noise section likelihood is 100%, and the minimum value α _min (for example, α _min = 0.1) of the original sound addition rate is set as the addition rate α (t). In the case of TN <Pow <TS, the original sound addition rate α (t) is determined by the method shown in FIG. That is, by obtaining a straight line passing through two points (TN, α _min ) and (TS, α _max ) on a graph with Pow on the horizontal axis and the original sound addition rate α (t) on the vertical axis, An original sound addition rate α (t) corresponding to Pow is calculated. The above operation is shown in Equation (1) using a programming language.

The original sound addition means 5 multiplies the original signal x (t) by the value of the original sound addition ratio α (t) determined by the original sound addition ratio determination means 4 and the signal S1 ′ (t) whose noise is suppressed. Are added and output as an output signal y (t).

次にこの発明の第２実施形態を説明する。この発明の第２実施形態は請求項４で提案する収音方法と、請求項７で提案する収音装置に対応する。
ここでは音声区間度算出手段３の一例について説明する。実施形態１では、音声区間度算出に用いる信号は、既存の雑音抑圧手段２により雑音抑圧された信号Ｓ１′（ｔ）、または原信号ｘ（ｔ）を用いた。信号Ｓ１′（ｔ）を使う場合、例えば既存の雑音抑圧手段２の雑音抑圧処理がうまく働かない場合、音声区間度算出手段３の算出精度にも悪影響を及ぼす可能性があった。また、原信号ｘ（ｔ）を用いた場合、受音信号のＳ／Ｎ比が著しく悪い（例えばＳ／Ｎ比が０ｄＢまたは負の値となる）場合には、音声区間度の算出が困難であった。 Next, a second embodiment of the present invention will be described. The second embodiment of the present invention corresponds to the sound collection method proposed in claim 4 and the sound collection device proposed in claim 7.
Here, an example of the speech segment degree calculation means 3 will be described. In the first embodiment, the signal S1 ′ (t) or the original signal x (t) that has been noise-suppressed by the existing noise suppression means 2 is used as the signal used for calculating the speech interval degree. When the signal S1 ′ (t) is used, for example, when the noise suppression process of the existing noise suppression unit 2 does not work well, the calculation accuracy of the speech interval degree calculation unit 3 may be adversely affected. Further, when the original signal x (t) is used, it is difficult to calculate the speech interval degree when the S / N ratio of the received signal is extremely bad (for example, the S / N ratio is 0 dB or a negative value). Met.

この実施形態２では、これら雑音抑圧手段２の影響や原信号ｘ（ｔ）のＳ／Ｎ比の影響を受けずに、なるべく精度の高い音声区間度を算出する方法の一例を図４に示す。音声区間度算出手段３の入力手段１Ａ、１Ｂは例えば、２本以上のマイクロホンで構成される。帯域分割手段６では入力手段１Ａ、１Ｂからの信号を周波数分析する。周波数分析には例えばフーリエ変換が用いられる。チャネル間レベル差算出手段７では各周波数成分におけるチャネル間（入力手段１Ａ側と１Ｂ側をチャネル間と称す）のレベル差ΔＡ（ω）が算出される。ΔＡ（ω）は式（２）で定義される。
ΔＡ（ω）＝２０ｌｏｇ_１０［｜Ｘ_１（ω）｜／｜Ｘ_２（ω）｜］（２）
音源信号判定手段８においてはチャネル間レベル差ΔＡ（ω）の値に基づき、各周波数成分が目的信号の成分か、雑音信号の成分かを判定する。例えば図４のように、目的音源３が入力手段１Ｂに比べて入力手段１Ａのほうに近く配置され、逆に、雑音源Ｎが入力手段１Ａに比べて入力手段１Ｂのほうに近く配置されている場合には、ΔＡ（ω）≦０を満たす周波数成分は雑音信号成分である、と判定される。 In the second embodiment, FIG. 4 shows an example of a method for calculating a speech segment degree with the highest possible accuracy without being influenced by the noise suppression means 2 and the S / N ratio of the original signal x (t). . The input means 1A and 1B of the speech interval degree calculating means 3 are composed of, for example, two or more microphones. The band dividing means 6 performs frequency analysis on the signals from the input means 1A and 1B. For example, Fourier transform is used for frequency analysis. The inter-channel level difference calculating means 7 calculates the level difference ΔA (ω) between channels (the input means 1A side and 1B side are referred to as inter-channel) in each frequency component. ΔA (ω) is defined by equation (2).
ΔA (ω) = 20 log ₁₀ [| X ₁ (ω) | / | X ₂ (ω) |] (2)
The sound source signal determination means 8 determines whether each frequency component is a target signal component or a noise signal component based on the value of the inter-channel level difference ΔA (ω). For example, as shown in FIG. 4, the target sound source 3 is arranged closer to the input means 1A than the input means 1B, and conversely, the noise source N is arranged closer to the input means 1B than the input means 1A. If it is, the frequency component satisfying ΔA (ω) ≦ 0 is determined to be a noise signal component.

音源信号選択手段９においては、音源信号判定手段８の判定結果に基づき、目的信号成分にはあるゲイン値ｇ_Ｓが、雑音信号成分にはあるゲイン値ｇ_Ｎが乗算される。音源信号選択手段８における制御式をプログラム言語を用いて式（３）に示す。
if ΔＡ（ω）≧０ｔｈｅｎＳ＾_１（ω）＝ｇ_Ｓ・Ｘ_１（ω）
elseif ΔＡ（ω）≦０ｔｈｅｎＳ＾_１（ω）＝ｇ_Ｎ・Ｘ_１（ω）（３）
ゲイン値の例として例えば、ｇ_Ｓ＝１．０，ｇ_Ｎ＝０．０を与える。 In the sound source signal selection means 9, based on the determination result of the sound source signal determination means 8, the target signal component is multiplied by a certain gain value g _S and the noise signal component is multiplied by a certain gain value g _N. A control expression in the sound source signal selecting means 8 is shown in Expression (3) using a program language.
if ΔA (ω) ≧ 0 then S ^ ₁ (ω) = g _S · X ₁ (ω)
elseif ΔA (ω) ≦ 0 then S ^ ₁ (ω) = g _N · X ₁ (ω) (3)
For example, g _S = 1.0 and g _N = 0.0 are given as examples of gain values.

パワー積算手段１０においては、音源信号選択手段９において重み付けされた信号Ｓ＾_１（ω）のパワーを全周波数帯域に渡り積算する。積算したパワーＰｏｗ値を原音付加率決定手段４へ送る。原音付加率決定手段４においては、この積算されたパワーＰｏｗ値を用いて、実施形態１で述べたのと同様の制御で付加率を決定する。図４に示した音声区間度算出手段３は、原信号に対して、音源の方向情報を利用し、各周波数成分が音声信号、雑音信号どちらのものであるか判定し、音声スペクトルと判定された帯域はゲイン１．０を乗算することで強調し、雑音スペクトルと判定された帯域はゲイン値０を乗算することで抑圧するため、実質的には原信号ｘ（ｔ）のＳ／Ｎ比を改善したのと等価となり、Ｓ／Ｎ比を改善した後に音声区間度を算出することになる。このため、Ｓ／Ｎ比が悪い信号に対して、原信号をそのまま使う方法よりも音声区間度の算出精度が向上することができる。また、既存の雑音抑圧処理と独立して音声区間度を算出できるため、仮に既存の雑音抑圧処理の性能が悪い場合にもその影響を受けずに音声区間度を算出することができる。 The power integration means 10 integrates the power of the signal S ₁ (ω) weighted by the sound source signal selection means 9 over the entire frequency band. The integrated power Pow value is sent to the original sound addition rate determination means 4. The original sound addition rate determination means 4 determines the addition rate by the same control as described in the first embodiment using the integrated power Pow value. The voice interval degree calculating means 3 shown in FIG. 4 uses the direction information of the sound source for the original signal, determines whether each frequency component is a voice signal or a noise signal, and is determined as a voice spectrum. The band determined as a noise spectrum is suppressed by multiplication by a gain value of 0, so that the S / N ratio of the original signal x (t) is substantially reduced. Thus, the speech interval degree is calculated after improving the S / N ratio. For this reason, it is possible to improve the accuracy of calculating the speech interval degree for a signal having a poor S / N ratio, compared to the method using the original signal as it is. In addition, since the speech interval degree can be calculated independently of the existing noise suppression processing, the speech interval degree can be calculated without being affected even when the performance of the existing noise suppression processing is poor.

次に、この発明の実施形態３を説明する。この実施形態３は請求項５で提案する収音方法と請求項８で提案する収音装置に対応する。
構成例を図５に示す。この実施形態３では実施形態１で述べた原音付加率決定手段４において、原音付加率α（ｔ）を変化させる際、原音付加率α（ｔ）の時間変化を滑らかにする原音付加率平滑化手段１１を加えた構成を特徴とするものである。
原音付加率α（ｔ）の時間変化を滑らかにするために、一つ前の時刻における原音付加率α_ｐｒｅ（ｔ）と現時刻の原音付加率α（ｔ）の差分を算出し、その差分の極性（＋，−）に応じて現時刻の原音付加率α（ｔ）を決定する。決定法の一例をプログラム言語を用いて式（４）に示す。
if (α_pre(ｔ)＜α(ｔ)) then α_smooth(ｔ)＝α_pre(ｔ)＋atk（α(ｔ)−α_pre(ｔ)）
else then α(ｔ)_smooth＝α_pre(ｔ)＋rls（α(ｔ)−α_pre(ｔ)）
（４）
ここで、ａｔｋとｒｌｓは０＜ａｔｋ＜１と０＜ｒｌｓ＜１を満たす値である。上記動作により、原音付加率α（ｔ）は毎時刻に大幅に変化することは無く、一つ前の時刻からの微増、または微減にとどまるため時間変化が滑らかとなり、処理後の音質歪が小さくなる。微増、微減どちらの方向へ推移するかは、現時刻の原音付加率α（ｔ）と前時刻の付加率の差分値により決定され、現時刻の原音付加率α（ｔ）が前時刻に比べて増加している場合には微増の方向へ、減少している場合には微減の方向へ推移する。この第３実施形態の発明はもちろん、第２実施形態と合わせて使うことも出来る。 Next, a third embodiment of the present invention will be described. The third embodiment corresponds to the sound collection method proposed in claim 5 and the sound collection device proposed in claim 8.
A configuration example is shown in FIG. In the third embodiment, when the original sound addition rate α (t) is changed in the original sound addition rate determining means 4 described in the first embodiment, the original sound addition rate smoothing is performed to smooth the time change of the original sound addition rate α (t). The configuration is characterized by adding means 11.
In order to smooth the time change of the original sound addition rate α (t), the difference between the original sound addition rate α _pre (t) at the previous time and the original sound addition rate α (t) at the current time is calculated, and the difference The original sound addition rate α (t) at the current time is determined according to the polarity (+, −). An example of the determination method is shown in Formula (4) using a programming language.
if (α _pre (t) <α (t)) then α _smooth (t) = α _pre (t) + atk (α (t) −α _pre (t))
else then α (t) _smooth = α _pre (t) + rls (α (t) −α _pre (t))
(4)
Here, atk and rls are values satisfying 0 <atk <1 and 0 <rls <1. With the above operation, the original sound addition rate α (t) does not change significantly every time, and the time change becomes smooth because it only slightly increases or decreases from the previous time, and the sound quality distortion after processing is small. Become. The direction of the slight increase or slight decrease is determined by the difference between the original sound addition rate α (t) at the current time and the addition rate at the previous time, and the original sound addition rate α (t) at the current time is compared with the previous time. If it is increasing, it will increase slightly, and if it is decreasing, it will decrease slightly. The invention of the third embodiment can be used together with the second embodiment.

このように、上記手段を用いることで、雑音が混ざった信号から音声区間らしさを表す物理量を算出し、その値を元に音声区間らしさに応じて原音付加率を自動で調整することが可能となり、従来の全区間一律に原音付加率を定める方法に比べて、音声区間においては歪低減が実現し、雑音区間においてはＳ／Ｎ比改善量が確保され、全区間トータルで聞きやすく、なおかつ残留雑音の少ない信号を抽出し、収音することができる。
上述したこの発明の収音方法はこの発明による収音プログラムをコンピュータに実行させることによって実現される。この発明による収音プログラムはコンピュータが解読可能なプログラム言語によって記述され，磁気ディスク或はＣＤ−ＲＯＭ等の記録媒体に記録され、これらの記録媒体からコンピュータインストールされるか、又は通信回線を通じてインストールされる。インストールされた収音プログラムはコンピュータに備えられた中央演算処理装置ＣＰＵに解読されて実行される。 In this way, by using the above means, it is possible to calculate a physical quantity representing the likelihood of a speech section from a signal mixed with noise and automatically adjust the original sound addition rate according to the likelihood of the speech section based on the value. Compared with the conventional method of uniformly determining the original sound addition rate for all sections, distortion reduction is realized in the voice section, and the S / N ratio improvement amount is secured in the noise section, making it easy to hear in all sections, and remaining. A signal with less noise can be extracted and collected.
The sound collecting method of the present invention described above is realized by causing a computer to execute the sound collecting program according to the present invention. The sound collection program according to the present invention is written in a computer-readable program language, recorded on a recording medium such as a magnetic disk or a CD-ROM, and installed on the computer from the recording medium or installed through a communication line. The The installed sound collecting program is decoded and executed by a central processing unit CPU provided in the computer.

この発明による収音方法及び収音装置は例えば音声認識装置の収音装置に適用することができ、収音する音声信号の品質を改善することにより、音声認識装置の認識率を向上させることができる。 The sound collection method and sound collection device according to the present invention can be applied to, for example, a sound collection device of a voice recognition device, and can improve the recognition rate of the voice recognition device by improving the quality of a voice signal to be collected. it can.

この発明の第１実施形態を説明するためのブロック図。The block diagram for demonstrating 1st Embodiment of this invention. 図１に示した第１実施形態の変形例を説明するためのブロック図。The block diagram for demonstrating the modification of 1st Embodiment shown in FIG. 図１に示した第１実施形態の動作を説明するためのグラフ。The graph for demonstrating the operation | movement of 1st Embodiment shown in FIG. この発明の第２実施形態を説明するためのブロック図。The block diagram for demonstrating 2nd Embodiment of this invention. この発明の第３実施形態を説明するためのブロック図。The block diagram for demonstrating 3rd Embodiment of this invention.

Explanation of symbols

１、１Ａ、１Ｂ音声入力手段７チャネル間レベル差算出手段
２雑音抑圧手段８音源信号判定手段
３音声区間度算出手段９音源信号選択手段
４原音付加率決定手段１０パワー積算手段
５原音付加手段１１原音付加率平滑化手段
６帯域分割手段１００収音装置 1, 1A, 1B Voice input means 7 Channel level difference calculating means
2 Noise suppression means 8 Sound source signal judgment means
3 Voice interval degree calculation means 9 Sound source signal selection means
4 Original sound addition rate determining means 10 Power integrating means
5 Original sound addition means 11 Original sound addition rate smoothing means
6 Band dividing means 100 Sound collecting device

Claims

Noise suppression processing that suppresses noise contained in the input signal and emphasizes the target signal;
A voice segment degree calculation process for calculating the likelihood of a voice segment in each segment of the input signal;
An original sound addition rate determination process for determining a rate of adding the original sound to the target signal subjected to the noise suppression process based on the likelihood of the voice interval calculated in the voice interval degree calculation process;
Original sound addition processing for adding the original sound to the target signal subjected to the noise suppression processing according to the original sound addition rate determined in the original sound addition rate determination processing;
A sound collection method comprising:

2. The sound collection method according to claim 1, wherein the voice interval degree calculation processing determines a section where the power of the input signal is a predetermined value or more as a voice section, and determines a section below the predetermined value as a noise section. Sound method.

The sound collection method according to claim 1, wherein the voice segment degree calculation processing determines a section in which the power of the input signal is equal to or greater than a first set value TS as a voice section, and the power of the input signal is greater than the first set value. A section having a small second set value TN or less is determined as a noise section, and when the power of the input signal is between the first set value TS and the second set value TN in the previous period, the sound is determined according to the power of the input signal. A sound collection method characterized by determining the likelihood.

The sound collection method according to claim 1, wherein the speech segment degree calculation processing includes at least a pair of input means installed at positions where the distances to the target sound source and the noise source are different from each other. The target signal including the noise signal captured by the input means is band-divided for each channel, the band components of each band-divided channel are compared between channels for each same band component, and the input according to the level comparison result The voice signal component or the noise signal component is determined in association with the distance difference between the means and the target sound source and the noise source, the power of the frequency band component determined as the voice signal component is integrated, and the integrated value of this power A sound collection method, wherein the sound quality is determined by the method.

5. The sound collection method according to claim 1, wherein a difference between an original sound addition rate at the current time determined by the original sound addition rate determination process and an original sound addition rate determined at a previous time is calculated, A sound collection method characterized by gradually changing an original sound addition rate toward a target original sound addition rate at a current time according to polarity.

Noise suppression means having a function of suppressing noise with respect to the output signal of the voice input means and emphasizing a target signal;
A voice interval degree calculating means for calculating the likelihood of a voice interval at a certain time of the output signal of the voice signal input means or the noise suppression means;
Original sound addition rate determining means for determining the original sound addition rate α (t) (0 <α (t) <1) based on the likelihood of the voice interval calculated by the voice interval degree calculating unit;
Original sound adding means for adding the output signal of the voice input means to the output signal of the noise suppressing means after multiplying the original sound addition ratio α (t) determined by the original sound addition ratio determining means (0 <α (t) <1) When,
A sound collecting device comprising:

The sound collecting device according to claim 6,
A plurality of microphones provided apart from each other as the sound input means, and a band dividing means for dividing a plurality of channel signals output from the plurality of microphones as the sound interval degree calculating means, and the band dividing means For each same band of each channel signal divided in step 1, a level difference calculation unit between channels for each band to calculate a level difference for each same band component between channels;
Sound source signal determination means for determining which one of the above-mentioned band-divided channel signals of each band is a signal input from which sound source based on the level difference between channels for each band component;
Based on the determination of the sound source signal determination means, sound source signal selection means for selecting a signal input from the target sound source from each of the band-divided channel signals,
Power integrating means for integrating the power of the output signal of the sound source signal selecting means;
Original sound addition rate determining means for determining the original sound addition rate based on the power value integrated by the power integration means;
Original sound addition means for adding the original sound signal output by the voice input means to the target signal output by the noise suppression means according to the original sound addition ratio determined by the original sound addition ratio determination means;
A sound collecting device comprising:

8. The sound collecting device according to claim 6, wherein a difference between the original sound addition rate at the current time determined by the original sound addition rate determination means and the original sound addition rate determined at the previous time is obtained, and a value of the difference is obtained. And an original sound addition rate smoothing means for gradually changing the original sound addition rate toward the target original sound addition rate at the current time according to the polarity.

6. A sound collection program that is written in a computer-readable program language and causes the computer to execute the sound collection method according to claim 1.