JP2913105B2

JP2913105B2 - Sound signal detection method

Info

Publication number: JP2913105B2
Application number: JP2059641A
Authority: JP
Inventors: 豊金田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-03-10
Filing date: 1990-03-09
Publication date: 1999-06-28
Anticipated expiration: 2014-06-28
Also published as: JPH0327698A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、雑音と所望の音響信号が混在する信号に対
して、所望の音響信号の存在する時間区間を検出する音
響検出方法に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound detection method for detecting a time section in which a desired sound signal exists, for a signal in which noise and a desired sound signal are mixed.

［従来の技術］近年、音声認識装置の開発はめざましいものがある
が、耐雑音性のある音声認識装置の開発は遅れている。
その理由は、雑音環境下で正しく音声区間検出（時間軸
上で音声が存在する時間区間を判定すること）を行うこ
とが難しいという点にある。雑音区間を誤って音声と判
定した場合、雑音をむりやり何かの音韻と対応づけてし
まうため、正しい音声認識結果を得ることは不可能であ
る。従って雑音下でも良好に動作する音声区間検出技術
の開発は大変重要なものと考えられている。[Related Art] In recent years, there has been remarkable development of a speech recognition device, but development of a noise-resistant speech recognition device has been delayed.
The reason is that it is difficult to correctly detect a speech section (determining a time section where speech exists on the time axis) in a noise environment. If the noise section is erroneously determined to be speech, the noise is forcibly associated with some phoneme, so that it is impossible to obtain a correct speech recognition result. Therefore, the development of a speech section detection technique that operates well even under noise is considered to be very important.

第13図は第１の従来の音声区間検出法を説明する図で
ある。同図は、信号の短時間パワーの時間的変化を表す
もので、縦軸はマイクロホンから出力された信号の短時
間パワーを、横軸は時刻を表している。以下、本明細書
では特に明記しない場合、「パワー」とは短時間パワー
を表している。信号には定常雑音11（時間的にパワーが
ほぼ一定の雑音：例えば、空調雑音や機器のファン雑
音）、非定常雑音12（時間的にパワーが大きく変動する
雑音：例えば、ドアの閉まる音や不要な音声）および
（所望の）音声13が含まれている。定常雑音のパワーは
事前に知ることは可能であるが、非定常雑音のパワーは
予測不可能である。第１の従来法は、信号のパワーの監
視を続け、そのパワーが、定常雑音のパワーに基づいて
決定される閾値Th14より大きくなった時間区間を音声区
間と判定するものである。現在の音声認識装置の大半
は、この方法を用いて音声区間検出を行っている。FIG. 13 is a diagram for explaining a first conventional voice section detection method. This figure shows the temporal change of the short-term power of the signal, the vertical axis represents the short-term power of the signal output from the microphone, and the horizontal axis represents the time. Hereinafter, in this specification, “power” indicates short-time power unless otherwise specified. Signals include stationary noise 11 (noise whose power is almost constant over time: for example, air conditioning noise and equipment fan noise), and non-stationary noise 12 (noise whose power greatly fluctuates over time: for example, the sound of a door closing or (Unwanted voice) and (desired) voice 13 are included. The power of stationary noise can be known in advance, but the power of non-stationary noise is unpredictable. In the first conventional method, monitoring of the power of a signal is continued, and a time section in which the power becomes larger than a threshold Th14 determined based on the power of the stationary noise is determined as a voice section. Most current speech recognizers use this method to detect speech segments.

しかしこの方法では、第13図に示す正しい音声区間16
の検出も行えるが、パワーの大きい非定常雑音区間15も
誤って音声区間と判定してしまうという大きな問題点が
あった。この点を解決する第２の従来法は２個のマイク
ロホンを用いて、一方のマイクロホンは音声と周囲雑音
とのSN比が大きく、他のマイクロホンはSN比が小さくな
るように、即ち２つのマイクロホン出力にSN比の差が生
じるように設置される。この事を実現するマイクロホン
の具体的設置方法としては、第14図（ａ）に示すように
第１のマイクロホン１は発声者３の近くに、第２のマイ
クロホン２は発声者３から遠くの場所にそれぞれ設置す
る方法、あるいは第14図（ｂ）示すように第１のマイク
ロホン１は発声者３の正面に、第２のマイクロホン２は
発声者３の側面にそれぞれ設置する方法などが考えられ
る。これらの設置方法を行えば、第１のマイクロホン１
より出力される音声パワーは第２のマイクロホン２より
出力される音声パワーより大きく一方、雑音は遠方で発
声すると考えると、両マイクロホン１、２の出力におけ
る雑音パワーはほぼ等しく、その結果、２つのマイクロ
ホン１、２の出力にSN比の差が生じる。However, in this method, the correct speech section 16 shown in FIG. 13 is used.
Can be detected, but there is a serious problem that the non-stationary noise section 15 having a large power is erroneously determined to be a speech section. A second conventional method for solving this problem uses two microphones, one of which has a large S / N ratio between voice and ambient noise, and the other has a small S / N ratio, that is, two microphones. It is installed so that a difference in SN ratio occurs in the output. As a concrete installation method of the microphone for realizing this, as shown in FIG. 14 (a), the first microphone 1 is located near the speaker 3 and the second microphone 2 is located far from the speaker 3. 14 (b), or the first microphone 1 may be installed in front of the speaker 3, and the second microphone 2 may be installed in the side of the speaker 3, as shown in FIG. 14 (b). If these installation methods are performed, the first microphone 1
Assuming that the sound power outputted from the second microphone 2 is larger than the sound power outputted from the second microphone 2, while the noise is uttered at a distant place, the noise powers at the outputs of the two microphones 1 and 2 are almost equal. A difference in the SN ratio occurs between the outputs of the microphones 1 and 2.

第15図は第２の従来法の理想的動作を説明する図で、
第15図（ａ）は第１のマイクロホン出力の短時間パワー
P1の時間的変化を、第15図（ｂ）は第２のマイクロホン
出力の短時間パワーP2の時間的変化を表し、それぞれの
図において、第13図と同様に、11は定常雑音、12は非定
常雑音、13は音声を表している。２つのマイクロホンを
SN比の差が生じるように設置した結果、短時間パワーP2
における音声のパワーは、短時間パワーP1における音声
のパワーより小さくなり、一方、雑音のパワーは両者に
おいて等しくなっている。第２の従来法では、第15図
（ｃ）に示すように、２つの信号の短時間パワーP1とP2
の差ＰD（ＰD＝P1−P2）を計算し、このパワー差ＰD
が、記号17で示すある閾値Ｐthより大きくなった時間区
間18を音声区間と判定するものである。第15図（ｃ）よ
り、第２の従来法では、第１の従来法のようにパワーの
大きな非定常雑音12の区間を誤って音声区間と判定する
問題は生じないことがわかる。FIG. 15 is a diagram for explaining the ideal operation of the second conventional method.
FIG. 15 (a) shows the short-time power of the output of the first microphone
FIG. 15 (b) shows the temporal change of P1 and FIG. 15 (b) shows the temporal change of the short-time power P2 of the second microphone output. In each figure, as in FIG. Non-stationary noise, 13 represents voice. Two microphones
As a result of installation so that a difference in SN ratio occurs, short-time power P2
Is lower than the power of the voice at the short-time power P1, while the power of the noise is equal in both. In the second conventional method, as shown in FIG. 15 (c), short-time powers P1 and P2 of two signals are used.
Is calculated (PD = P1−P2), and the power difference PD
Is to determine the time section 18 in which the threshold value Pth indicated by the symbol 17 has become larger than the threshold value Pth as the voice section. From FIG. 15 (c), it can be seen that the second conventional method does not cause a problem that the section of the non-stationary noise 12 having a large power is erroneously determined to be a voice section unlike the first conventional method.

しかし、実際には、この第２の従来法が、このように
理想的に動作することはまれである。その理由は、２つ
の信号のパワー差を利用して利用して音声区間検出を正
しく行うためには、以下の３つの条件が満足さている必
要がある。However, in practice, this second conventional method rarely operates in such an ideal manner. The reason is that the following three conditions need to be satisfied in order to correctly detect a voice section using the power difference between two signals.

条件1:2つの信号にSN比の差があること。Condition 1: The two signals have a difference in SN ratio.

条件2:2つの信号における雑音区間および音声区間が、
ともに時間的に整合していること。Condition 2: the noise section and the voice section in the two signals are
Both must be time aligned.

条件3:種々の環境条件の変動による上記SN比の差の変動
が小さいこと。（SN比の差の安定性）ところが、第２の従来法では上記第１の条件にのみ注
目し、第２および第３の条件を考慮していないため、以
下に述べる問題点が発生する。Condition 3: The change in the difference in the S / N ratio due to the change in various environmental conditions is small. (Stability of S / N ratio difference) However, the second conventional method pays attention only to the first condition and does not consider the second and third conditions, so that the following problems occur.

まず、第１の問題点について説明する。第16図は第４
図（ａ）に雑音源４を書き加えたものである。この時、
音声は第１のマイクロホン１に先ず入力され、次に第２
のマイクロホン２に入力される。一方、雑音は第２のマ
イクロホン２に先ず入力され、次に第１のマイクロホン
１に入力される。従って、２つのマイクロホンの出力信
号において音声区間および雑音区間は整合しない。First, the first problem will be described. Fig. 16 shows the fourth
The noise source 4 is added to FIG. At this time,
Sound is first input to the first microphone 1 and then to the second microphone 1.
Is input to the microphone 2. On the other hand, noise is first input to the second microphone 2 and then to the first microphone 1. Therefore, in the output signals of the two microphones, the voice section and the noise section do not match.

この事を第17図に示した。第17図（ａ）は第１のマイ
クロホン出力の短時間パワーP1を、第17図（ｂ）は第２
のマイクロホン出力の短時間パワーP2を、第17図（ｃ）
はその短時間パワーの差ＰDをそれぞれ表している。ま
た、11は定常雑音、12は非定常雑音、13は音声を表して
いることは第15図の例と同様である。This is shown in FIG. FIG. 17 (a) shows the short-time power P1 of the output of the first microphone, and FIG.
The short-term power P2 of the microphone output of Fig. 17 (c)
Represents the short-term power difference PD. Also, reference numeral 11 denotes stationary noise, 12 denotes non-stationary noise, and 13 denotes voice, as in the example of FIG.

第17図（ａ）、（ｂ）における音声と雑音のパワーの
大きさの関係は、第15図（ａ）、（ｂ）におけるそれと
同一である。しかし、第17図では、音声は第２のマイク
ロホンの出力において、第１のマイクロホンの出力より
記号31で示す時間τＳだけ遅れたものとなっており、雑
音は信号32で示す時間τＮだけ進んだものとなってい
る。即ち、音声区間と雑音区間は、共に、時間的に整合
していない。その結果、２つの信号のパワーの差ＰDは
第17図（ｃ）のように第15図（ｃ）とは異なったものと
なり、記号17で示す閾値Ｐth以上の区間を音声区間を判
定した場合には、第17図（ｃ）の記号33に示した区間が
誤って音声区間と判定されてしまうという第１の問題が
生じる。この雑音区間の記号32で示す時間差をτＮは、
雑音源の位置により大きく変化するため、遅延器などを
用いて整合性を計ることは不可能である。The relationship between the magnitudes of voice and noise power in FIGS. 17 (a) and (b) is the same as that in FIGS. 15 (a) and (b). However, in FIG. 17, the sound is delayed at the output of the second microphone from the output of the first microphone by the time τS indicated by the symbol 31, and the noise is advanced by the time τN indicated by the signal 32. It has become something. That is, both the voice section and the noise section are not temporally matched. As a result, the difference PD between the powers of the two signals is different from that in FIG. 15 (c) as shown in FIG. 17 (c). Causes the first problem that the section indicated by the symbol 33 in FIG. 17 (c) is erroneously determined to be a voice section. The time difference indicated by the symbol 32 of this noise section τN is
Since it greatly changes depending on the position of the noise source, it is impossible to measure the consistency using a delay device or the like.

次に、第２の問題として、実際の環境においては、２
つのマイクロホン出力信号間のSN比の差を変動させる種
々の要因が存在し、２つの信号間のSN比の差の安定性を
確保することは難しいということを説明する。Next, as a second problem, in an actual environment, 2
Explain that there are various factors that change the S / N ratio difference between two microphone output signals, and it is difficult to ensure the stability of the S / N ratio difference between two signals.

変動要因の第１としては、雑音源の位置がある。前述
の説明では、雑音源は遠方にあると仮定したが、雑音源
が比較的近い位置にある時には、雑音源の位置はSN比の
差の大きな変動要因になる。第18図を用いてその例を示
す。第18図（ａ）（ｂ）において、前述した第16図の例
と同様に、１、２はそれぞれ第１および第２のマイクロ
ホン、３は発声者、４は雑音源である。雑音源がこの２
つの図に示す位置にあった場合には、音声のパワーと同
様に、第１のマイクロホン１の出力における雑音のパワ
ーが第２のマイクロホン２の出力の雑音のパワーより大
きくなる。その結果、２つのマイクロホン出力の間のSN
比の差は小さなものとなる。The first of the fluctuation factors is the position of the noise source. In the above description, the noise source is assumed to be far away, but when the noise source is relatively close, the position of the noise source causes a large variation in the S / N ratio difference. An example is shown with reference to FIG. 18 (a) and 18 (b), similarly to the example of FIG. 16 described above, reference numerals 1 and 2 denote first and second microphones, respectively, reference numeral 3 denotes a speaker, and reference numeral 4 denotes a noise source. The noise source is this 2
In the positions shown in the figures, the noise power at the output of the first microphone 1 becomes larger than the noise power at the output of the second microphone 2 as in the case of the audio power. As a result, the SN between the two microphone outputs
The difference in ratio is small.

第２の変動要因としては、発声者の動きがある。例え
ば、第18図（ｂ）において発声者が45°右方向に首を向
けることによってと、音声は２つのマイクロホンにほぼ
同一のパワーで受音される。その結果、２つのマイクロ
ホン１、２の出力において音声のパワー差は生じなくな
り、SN比の差は変動する。The second variation factor is the movement of the speaker. For example, when the speaker turns his head to the right by 45 ° in FIG. 18B, the sound is received by the two microphones with almost the same power. As a result, there is no longer any difference in audio power between the outputs of the two microphones 1 and 2, and the difference in SN ratio fluctuates.

第３の変動要因としては、室内反射音の影響がある。
２つのマイクロホン１、２が、SN比が異なるように設置
された場合の多くにおいて、時間的構造および大きさの
異なる反射音が、各マイクロホンにおける雑音および音
声に付加され、その結果、SN比は時間的に大きく変動す
る。As a third variation factor, there is an influence of room reflected sound.
In many cases where the two microphones 1, 2 are installed with different SNRs, reflected sounds of different temporal structure and loudness are added to the noise and speech at each microphone, so that the SNR is It fluctuates greatly over time.

さらにその他にも、電気的雑音、振動雑音など数多く
の変動要因が存在する。従って、これらのSN比の変動要
因が存在する環境下で、安定したSN比の差を確保するこ
とはきわめて困難であり、第２の従来法が有効に動作可
能なマイクロホン設置方法を見いだすことは容易ではな
い。Furthermore, there are many other fluctuation factors such as electric noise and vibration noise. Therefore, it is extremely difficult to secure a stable difference between the S / N ratios in an environment in which these fluctuation factors of the S / N ratio exist, and it is difficult to find a microphone installation method that can effectively operate the second conventional method. It's not easy.

このように、第２の従来法には重大な問題点があり、
実用的には十分な性能を発揮することはできない。Thus, the second conventional method has a serious problem,
Practically, it cannot exhibit sufficient performance.

次に、上記第２の従来法の問題点の解決をねらいとし
た第３の従来法を第19図を用いてこの方法を説明する。
第19図において、前述した例と同様に、１は第１のマイ
クロホン、２は第２のマイクロホンである。また、21は
短時間パワー計算部、22は音声区候補選択部、23、24は
音声区間候補における平均パワー計算部、25はパワー差
検出部、26は音声区間候補検定部である。Next, a third conventional method aiming at solving the problems of the second conventional method will be described with reference to FIG.
In FIG. 19, as in the example described above, reference numeral 1 denotes a first microphone, and 2 denotes a second microphone. Further, 21 is a short-time power calculation unit, 22 is a voice section candidate selection unit, 23 and 24 are average power calculation units for voice section candidates, 25 is a power difference detection unit, and 26 is a voice section candidate test unit.

この方法において、第２の従来法と同様に、第１のマ
イクロホン１は、音声と周囲雑音とのSN比が大きく、第
２のマイクロホン２は、前者のマイクロホン１に比べて
SN比が小さくなるように設置される。この方法におい
て、まず、第１のマイクロホンの出力信号の短時間パワ
ーを、短時間パワー計算部21において計算する。次に、
音声区間候補検出部22において、信号の短時間パワーの
監視を続け、そのパワーが、定常雑音のパワーに基づい
て決定される閾値Thより大きくなった時間区間を音声区
間候補として選択する。ここまでの動作は第13図に示し
た第１の従来法と全く同一である。従って、第13図の記
号15で示した雑音区間も音声区間候補として選択されて
いる。次に、平均パワー計算部23、24において、この候
補区間における第１のマイクロホン１の出力の平均パワ
ーおよび第２のマイクロホン２の出力の平均パワーを算
出する。次に、パワー差検出部25において、各々の平均
パワーの差ＰDLを求める。最後に、音声区間候補検定部
26において、予め定めた閾値ＰDLtより大きい時にはそ
の候補区間を音声区間と決定し、小さい時にはその候補
区間を棄却する。In this method, as in the second conventional method, the first microphone 1 has a large S / N ratio between the voice and the ambient noise, and the second microphone 2 has a higher SNR than the former microphone 1.
It is installed so that the SN ratio becomes small. In this method, first, the short-time power of the output signal of the first microphone is calculated by the short-time power calculator 21. next,
The voice section candidate detection unit 22 continues to monitor the short-time power of the signal, and selects a time section in which the power becomes larger than a threshold Th determined based on the power of the stationary noise as a voice section candidate. The operation so far is exactly the same as that of the first conventional method shown in FIG. Therefore, the noise section indicated by the symbol 15 in FIG. 13 is also selected as a speech section candidate. Next, the average power calculators 23 and 24 calculate the average power of the output of the first microphone 1 and the average power of the output of the second microphone 2 in this candidate section. Next, in the power difference detection section 25, a difference PDL between the respective average powers is obtained. Finally, the voice section candidate test section
In 26, if the threshold is larger than a predetermined threshold value PDLt, the candidate section is determined as a voice section, and if the threshold is smaller, the candidate section is rejected.

この第３の従来法において特徴的なことは、短時間パ
ワーの差でなく、第１のマイクロホン１の出力において
音声区間候補として選んだ、比較的長時間区間内の平均
パワーの差を計算することである。従って、第17図
（ａ）、（ｂ）のように、２つのマイクロホン出力にお
いて、音声区間や雑音区間が時間的に整合していなくて
も、また、２つの信号に時間的構造が異なった反射音が
付加されてSN比の時間的変動があったとしても、その事
が平均パワーの差におよぼす影響は小さく、前記第２の
従来法の問題点は改善される。What is characteristic in the third conventional method is not the difference in short-term power but the difference in average power in a relatively long time section selected as a speech section candidate in the output of the first microphone 1. That is. Therefore, as shown in FIGS. 17 (a) and (b), in the two microphone outputs, even if the speech section and the noise section are not temporally matched, the two signals have different temporal structures. Even if there is a temporal change in the SN ratio due to the addition of the reflected sound, the influence on the difference in the average power is small, and the problem of the second conventional method is improved.

［発明が解決しようとする課題］しかし、この方法では候補区間内の平均パワーにより
音声区間を決定しているために、雑音区間と音声区間が
連続的に存在する場合には誤った判定結果を生じる。第
20図にそのような場合の例を示す。第20図は、第１のマ
イクロホン１の出力を表しており、正しい音声区間は図
の34の区間である。この図において、非定常雑音12と音
声13は時間的に近接しているため、短時間パワーが記号
14で示す閾値Thを越える、雑音区間と音声区間を一つに
した区間35が音声区間候補として選ばれてしまう。従っ
て、平均パワーの差を求めた結果、この候補区間が正し
い音声区間と判定された場合には、第20図の記号36に示
した区間が誤判定区間となってしまうし、また、この音
声区間が棄却された場合には正しい、音声区間が非音声
区画とみなされたことになって、いずれの場合において
も誤った判定結果となるという問題が生じる。[Problems to be Solved by the Invention] However, in this method, since the voice section is determined based on the average power in the candidate section, an erroneous determination result is obtained when the noise section and the voice section exist continuously. Occurs. No.
Fig. 20 shows an example of such a case. FIG. 20 shows the output of the first microphone 1, and the correct voice section is the section 34 in the figure. In this figure, since the nonstationary noise 12 and the voice 13 are close in time, the short-time power is
A section 35 that exceeds the threshold Th indicated by 14 and combines the noise section and the voice section into one is selected as a voice section candidate. Therefore, as a result of calculating the difference in the average power, if this candidate section is determined to be a correct voice section, the section indicated by the symbol 36 in FIG. 20 becomes an erroneously determined section, and If the section is rejected, it is determined that the voice section is correct, and the voice section is regarded as a non-voice section. In any case, an incorrect determination result occurs.

このことから、この第３の従来法は、第２の従来法の
持つ問題点を解決する手法となっていないことがわか
る。This indicates that the third conventional method is not a method for solving the problems of the second conventional method.

このように、従来の音声区間検出法では上述した数々
の問題点があるため、非定常雑音が存在する場合に、正
しい音声区間の検出を行うことは困難であった。As described above, the conventional voice section detection method has the above-described various problems, and it has been difficult to detect a correct voice section when non-stationary noise exists.

それ故、本発明の主目的は、従来より高い確率で、非
定常雑音環境下における音声区間を検出できる方法を提
供することにある。Therefore, a main object of the present invention is to provide a method capable of detecting a speech section in a non-stationary noise environment with a higher probability than before.

また本発明の他の目的は、発生者の近く（マイクロホ
ンから発声者を見たとき±30度の範囲）を除いた任意の
位置に雑音源があったとしても、音声区間の検出ができ
る方法を提供することにある。Another object of the present invention is to provide a method for detecting a voice section even when a noise source is present at any position except for the vicinity of a generator (a range of ± 30 degrees when a speaker is viewed from a microphone). Is to provide.

［課題を解決するための手段］このような課題を達成するために、本発明は、以下の
用件を必須とする。すなわち、前述したように、２つの
信号のパワー差を利用して音声区間検出を正しく行うた
めには、以下の３つの条件が必要である。[Means for Solving the Problems] In order to achieve such problems, the present invention requires the following requirements. That is, as described above, the following three conditions are necessary to correctly detect a voice section using the power difference between two signals.

条件3:種々の環境条件の変動による上記SN比の差の変動
が小さいこと。Condition 3: The change in the difference in the S / N ratio due to the change in various environmental conditions is small.

（SN比の差の安定性）本発明の第１の特徴は、上記第１と第２の条件を同時
に満足させるために、同一の場所（厳密な意味での同一
の場所ではなく、本発明を有効に動作させるために、実
質的に同一と見なせる場所）にSN比の異なる信号を発生
させる２つの受音器を設置し、その２つの出力信号のパ
ワー差を用いて音声区間の検出を行う点にある。また、
本発明の第２の特徴は、上記第３の条件を満足させるた
めに、上記２つの受音器のうちの１つは、指向性制御機
能を有したマイクロホンアレーシステムを用いる点にあ
る。(Stability of S / N ratio difference) The first feature of the present invention is that, in order to simultaneously satisfy the above first and second conditions, the same place (not the same place in a strict sense, the present invention) In order to operate effectively, two sound receivers that generate signals with different S / N ratios are installed at places where they can be regarded as substantially the same), and the detection of a voice section is performed using the power difference between the two output signals. The point is to do. Also,
According to a second feature of the present invention, in order to satisfy the third condition, one of the two sound receivers uses a microphone array system having a directivity control function.

［作用］本発明の第１の特徴によれば、雑音も音声も２つの受
音器には同一時刻に到達するので、２つの受音器出力信
号における雑音区間および音声区間はともに時間的に整
合している。従って、第２の従来法における第１の問題
点は解決される。また、２つの受音器が同一位置に設置
されていれば、各信号に付与される反射音の時間的構造
も同一のものとなるため、前記第２の従来法における第
２の問題点として述べた２つの受音器出力にSN比の差の
変動に及ぼす反射音の影響は大幅に軽減される。[Operation] According to the first feature of the present invention, since both noise and voice reach the two receivers at the same time, both the noise section and the voice section in the two receiver output signals are temporally different. Be consistent. Therefore, the first problem in the second conventional method is solved. Further, if the two sound receivers are installed at the same position, the temporal structure of the reflected sound given to each signal becomes the same, so that the second problem in the second conventional method is as follows. The effect of the reflected sound on the fluctuation of the difference between the S / N ratios in the two receiver outputs described above is greatly reduced.

次に、本発明の第２の特徴によれば、前記第２の従来
法における第２の問題点として述べた２つの受音器出力
間のSN比の差の変動に及ぼす雑音源位置、および発声者
の移動の問題が改善できる。Next, according to a second aspect of the present invention, a noise source position affecting the variation of the S / N ratio difference between the two receiver outputs described as the second problem in the second conventional method, and The problem of speaker movement can be improved.

［実施例］本発明の構成図を第１図に示した。第１図において、
41はSN比の高い信号を出力する第１の受音器（マイクロ
ホンアレーシステム）で、複数のマイクロホン素子より
構成されるマイクロホンアレー51と指向特性制御部52と
より構成される。42は第１の受音器出力のSN比に比べて
SN比の低い信号を出力する第２の受音器で、この２つの
受音器は同一の場所に設置されている。また、43、44は
短時間パワー計算部、45は２つの信号のパワー差に基づ
く音声区間検出部である。Embodiment FIG. 1 shows a configuration diagram of the present invention. In FIG.
Reference numeral 41 denotes a first sound receiver (microphone array system) that outputs a signal having a high SN ratio, and includes a microphone array 51 including a plurality of microphone elements and a directivity control unit 52. 42 is compared to the SN ratio of the first receiver output
This is a second sound receiver that outputs a signal with a low SN ratio, and these two sound receivers are installed in the same place. Reference numerals 43 and 44 denote short-time power calculators, and reference numeral 45 denotes a voice section detector based on a power difference between two signals.

さて、本発明の効果を説明するために、第１図の構成
における、第１の受音器41として、マイクロホンアレー
システムの代わりに、単一指向性マイクロホンを、第２
の受音器42として無指向性マイクロホンを用いた方法を
考える。そのようにすれば、発声者に指向性を向けた第
１の受音器の出力のSN比は、指向性を有しない第２の受
音器の出力のSN比より大きなものになる。Now, in order to explain the effect of the present invention, a unidirectional microphone is used instead of the microphone array system as the first sound receiver 41 in the configuration of FIG.
Consider a method using an omnidirectional microphone as the sound receiver 42 of FIG. By doing so, the SN ratio of the output of the first sound receiver having directivity directed to the speaker becomes larger than the SN ratio of the output of the second sound receiver having no directivity.

しかし、この方法は必ずしも良好に動作しない。この
ことを第２図を用いて説明する。第２図において、61は
単一指向性マイクロホンの、62は無指向性マイクロホン
の、それぞれの指向性パターンを示しており、３は発声
者、63、64は雑音源の位置を表している。第２図
（ａ），（ｂ）からわかるように、単一指向性マイクロ
ホンは発声者の法に向けた正面方向に対しては感度が高
く、その逆方向には感度が低い。無指向性マイクロホン
は全ての方向に同一の感度を持っている。従って、雑音
源が第２図（ａ），（ｂ）の記号63の位置にあれば、単
一指向性マイクロホンの出力のSN比は無指向性マイクロ
ホンのSN比より大変大きくなる。しかし、第２図
（ａ），（ｂ）において、雑音源が例えば記号64の位置
にある時（またはその位置に移動した時）には、単一指
向性マイクロホンの雑音に対する感度は高くなるため、
単一指向性マイクロホンの出力と無指向性マイクロホン
の出力のSN比の差は小さくなってしまう。このように、
単一指向性マイクロホンを第１の受音器として用いる方
法では、雑音源の位置によりSN比が大きく変動するとい
う問題点が発生する。However, this method does not always work well. This will be described with reference to FIG. In FIG. 2, 61 indicates a directional pattern of a unidirectional microphone, 62 indicates a directional pattern of an omnidirectional microphone, 3 indicates a speaker, and 63 and 64 indicate positions of noise sources. As can be seen from FIGS. 2 (a) and 2 (b), the unidirectional microphone has high sensitivity in the front direction toward the speaker's law and low sensitivity in the opposite direction. Omnidirectional microphones have the same sensitivity in all directions. Therefore, if the noise source is located at the position of the symbol 63 in FIGS. 2A and 2B, the S / N ratio of the output of the unidirectional microphone is much larger than that of the non-directional microphone. However, in FIGS. 2A and 2B, when the noise source is at, for example, the position of the symbol 64 (or moves to that position), the sensitivity of the unidirectional microphone to noise increases. ,
The difference between the S / N ratio of the output of the unidirectional microphone and the output of the omnidirectional microphone becomes small. in this way,
The method using the unidirectional microphone as the first sound receiver has a problem that the SN ratio greatly varies depending on the position of the noise source.

上記の単一指向性マイクロホンを使用した場合の問題
点は、第３図に示すような超指向性を持つ受音器を、第
１図の第１の受音器41として用いれば解決するように考
えられるかもしれない。しかし、通常の超指向性受音器
の指向特性は周波数により異なっている。即ち、低周波
数域では第２図（ａ）の記号61のような広がった指向特
性を持ち、高周波数域では第２図（ａ）に示したものよ
りさらに鋭い指向特性を持つ。その結果、低周波数域の
雑音に対しては、前述したように雑音源の位置によりSN
比が変動するという問題が、高周波数域においては発声
者の少しの移動でSN比が変動するという問題が発生す
る。The problem in the case of using the above unidirectional microphone can be solved by using a sound receiver having super directivity as shown in FIG. 3 as the first sound receiver 41 in FIG. Might be considered. However, the directional characteristics of a normal super directional sound receiver differ depending on the frequency. That is, in the low frequency range, the directional characteristic has spread like the symbol 61 in FIG. 2A, and in the high frequency range, it has a sharper directional characteristic than that shown in FIG. 2A. As a result, for low-frequency noise, as described above, depending on the position of the noise source, SN
The problem that the ratio fluctuates is a problem that the SN ratio fluctuates with a slight movement of the speaker in the high frequency range.

以上説明したように、良好な音声区間検出結果を得る
ためには、第１図に示した本発明の構成における第１の
受音器41として、良く知られている指向性受音器に代用
することは困難であることがわかる。As described above, in order to obtain a good voice section detection result, a well-known directional sound receiver is used as the first sound receiver 41 in the configuration of the present invention shown in FIG. It turns out to be difficult.

次に、指向性制御機能を持つマイクロホンアレーシス
テムを用いる本発明では、雑音源の位置や、発声者の移
動に対してもSN比の変動を小さく保つことができること
を説明する。Next, in the present invention using the microphone array system having the directivity control function, it will be described that the fluctuation of the SN ratio can be kept small even with respect to the position of the noise source and the movement of the speaker.

指向性制御機能を持つマイクロホンアレーシステムの
代表例は適応形アレー（Adaptive（microphone）arra
y）と呼ばれている受音器である。適応形アレーの一構
成例を第４図に示す。第４図において、51はマイクロホ
ンアレーで、Ｍ個のマイクロホン素子56l〜56Mより構成
される。A representative example of a microphone array system having a directivity control function is an adaptive array (Adaptive (microphone) arra
This is a sound receiver called y). FIG. 4 shows a configuration example of the adaptive array. In FIG. 4, reference numeral 51 denotes a microphone array, which is composed of M microphone elements 56l to 56M.

52は指向性制御部で、各マイクロホン出力に接続され
たフィルタ53l〜53M、フィルタ出力の総和をとる加算器
55およびフィルタ特性制御部54より構成される。52 is a directivity control unit, filters 53l to 53M connected to each microphone output, and an adder that calculates the sum of the filter outputs
55 and a filter characteristic control unit 54.

フィルタ特性制御部54には、各マイクロホン出力信号
および加算器55の出力xlが入力され、xlに含まれる雑音
成分を小さくするようにフィルタ531〜53Mの特性を制御
する。Each microphone output signal and the output xl of the adder 55 are input to the filter characteristic control unit 54, and the characteristics of the filters 531 to 53M are controlled so as to reduce noise components included in xl.

次に、このフィルタ特性制御部54の動作原理を説明す
る。加算器55の出力信号xlは、音声成分とｓと雑音成分
ｎとの和として、次式のように表される。Next, the operation principle of the filter characteristic control unit 54 will be described. The output signal xl of the adder 55 is expressed by the following equation as the sum of the voice component, s, and the noise component n.

xl＝ｓ＋ｎ（１）このとき、何の条件もつけずに雑音成分のパワーn2を
最少化するフィルタ特性を求めると、フィルタ53l〜53M
が全てゲイン零のフィルタとなってしまう。その結果雑
音成分ｎは零となって最少になるが、音声成分ｓも出力
されないとうい意味のない結果となる。そこで、フィル
タ動作の結果として得られる信号xlに含まれる音声成分
ｓに対して、ある拘束条件を設定し、その条件下でxlに
含まれる雑音成分ｎを最少化するフィルタの特性を求め
る。拘束条件の例としては、マイクロホン出力信号（フ
ィルタ入力信号）に含まれる音声成分をs0と表したと
き、ｓ＝s0という拘束条件や、|s−so|²の平均値が予め
定められ閾値以下とするという条件などが知られてい
る。xl = s + n (1) At this time, if a filter characteristic for minimizing the power n2 of the noise component is obtained without any condition, the filters 53l to 53M
Are all zero gain filters. As a result, the noise component n becomes zero and minimizes, but the result is meaningless unless the voice component s is also output. Therefore, a certain constraint condition is set for the audio component s included in the signal xl obtained as a result of the filter operation, and under the condition, the characteristic of the filter that minimizes the noise component n included in xl is obtained. Examples of the constraint condition when representing the sound component contained in the microphone output signal (filter input signal) and s0, and constraint that s = s0, | s-so | average value of ² is preset threshold value or less Are known.

さて、Ｍ個のマイクロホン素子の出力をu1〜uMと表
し、フィルタ531〜53Mの特性をh1〜hMと表すと、信号x1
のパワーx1²は、次のようになる。Now, if the outputs of the M microphone elements are represented by u1 to uM and the characteristics of the filters 531 to 53M are represented by h1 to hM, the signal x1
Power x1 ² of is as follows.

と表される。また、音声と雑音が互いに無相関であると
仮定すると、次式が成立する。 It is expressed as Assuming that speech and noise are uncorrelated with each other, the following equation holds.

x1²＝s²＋n² （３）（２）、（３）式より、x1に含まれる雑音成分のパワ
ーn²はフィルタ特性h1〜hMの２次関数となることがわか
る。従って、拘束条件のもとで雑音成分のパワーn²を最
少化するフィルタ制御の問題は、よく知られた拘束条件
付き２次関数の最少化の問題となる。 ^{^{^{x1 2 = s 2 + n 2}}} (3) (2), (3) from equation, the power n ² of the noise component contained in x1 is understood to be a quadratic function of the filter characteristics H1～hM. Therefore, the problem of the filter control for minimizing the power n ² of the noise component under the constraint condition becomes a well-known problem of minimizing the quadratic function with the constraint condition.

各種の拘束条件に対する種々の解法、具体的アルゴリ
ズムについては、文献（“Introduction to Adaptive A
rrays"R.A.Monzingo et al,John Wiley ＆ Sons,NEW YO
RK,1980）や、米国特許第4,536,887号に詳しく延べられ
ている。For various solutions to various constraints and specific algorithms, refer to the literature (“Introduction to Adaptive A
rrays "RAMonzingo et al, John Wiley & Sons, NEW YO
RK, 1980) and U.S. Pat. No. 4,536,887.

このように、x1に含まれる雑音成分を低減させること
は、雑音の到来方向に対するこのアレーシステムの感度
を低減することに相当し、その結果、このアレーシステ
ムは、目的方向に感度が高く、雑音源方向に感度の低い
指向特性を形成する。As described above, reducing the noise component included in x1 corresponds to reducing the sensitivity of the array system to the direction of arrival of the noise. As a result, the array system has high sensitivity in the target direction, A directional characteristic having low sensitivity is formed in the source direction.

第５図は、適応形アレーの形成する指向特性の一例66
を示す。第５図において、３はこれまでの実施例と同様
に発声者であり、63、64は雑音源である。第５図からわ
かるように、適応形アレーは、鋭い指向特性は持たない
が、雑音源の方向に感度の低い指向特性を実現する。こ
の指向特性の低感度の部分は「死角」と呼ばれ、マイク
ロホンアレーがＭ個の素子より構成されている時、アレ
ーシステムはＭ−１個の死角を形成することができる。FIG. 5 shows an example of the directional characteristics formed by the adaptive array.
Is shown. In FIG. 5, reference numeral 3 denotes a speaker as in the previous embodiments, and reference numerals 63 and 64 denote noise sources. As can be seen from FIG. 5, the adaptive array does not have sharp directional characteristics, but realizes directional characteristics with low sensitivity in the direction of the noise source. The low sensitivity part of this directional characteristic is called "blind spot", and when the microphone array is composed of M elements, the array system can form M-1 blind spots.

このような指向特性を形成する適応形アレーは、室内
で反射された雑音が、雑音源以外の方向からも多数到来
する場合には、超指向性受音器と比べて、得られるSN比
は小さい。しかし、雑音源の位置によらず、ほぼ一定の
SN比を得ることができるという特徴、また、発声者３の
方向に鋭い指向性を持たないため、発声者３の移動によ
るSN比の変動が少ないという特徴は、２つの信号のパワ
ー差を用いて音声区間検出を行う場合に必要なSN比の差
の安定性を確保するために大変適した受音器である。An adaptive array that forms such a directional characteristic has an S / N ratio that is higher than that of a super-directional receiver when a large number of noises reflected indoors come from directions other than the noise source. small. However, regardless of the position of the noise source, almost constant
The feature that an SN ratio can be obtained and the feature that the SN ratio does not fluctuate due to the movement of the speaker 3 because it does not have a sharp directivity in the direction of the speaker 3 use the power difference between the two signals. This is a sound receiver that is very suitable for ensuring the stability of the difference in the S / N ratio required when voice section detection is performed.

加えて、適応形アレーには雑音パワーの時間的変動を
小さくするという特徴がある。このことを第６図
（ａ），（ｂ）を用いて説明する。一般に室内では雑音
源の方向以外からも壁・床・天井などで反射された雑音
が受音器に入射する。適応形アレーはそれら全ての雑音
方向に死角を形成することはできず、マイクロホンアレ
ーがＭ個のマイクロホン素子より構成される時には、直
接音およびエネルギーの大きな反射音の入射する方向に
最大Ｍ−１個の死角を形成することによりSN比を改善す
る。In addition, the adaptive array has a feature of reducing the temporal fluctuation of noise power. This will be described with reference to FIGS. 6 (a) and 6 (b). Generally, in a room, noise reflected from walls, floors, ceilings, and the like from other than the direction of the noise source enters the sound receiver. The adaptive array cannot form a blind spot in all of the noise directions, and when the microphone array is composed of M microphone elements, a maximum of M-1 in the direction in which the direct sound and the high-energy reflected sound are incident. The S / N ratio is improved by forming a blind spot.

この効果を第６図（ａ），（ｂ）を用いて説明する。
第６図（ａ）は無指向性マイクロホンで受音した時のパ
ルス性雑音、第６図（ｂ）は適応形アレーで受音した時
のパルス性雑音を信号を表す。第６図（ａ）において71
は雑音源から直接受音した雑音、72、73、74は壁・床な
どで１回もしくは複数回反射してから受音した雑音であ
る。直接音71のエネルギーに比べて、反射音72、73、74
のエネルギーは時間とともに指数関数的に減衰する。ア
レーを構成するマイクロホン素子数を４とすると、この
適応形アレーは雑音源方向および72、73の反射音の方向
の３つの死角を形成する。従って、適応形アレー出力第
６図（ｂ）において74で示した雑音の反射音のパワーは
無指向性マイクロホンで受音したものと大きな差はない
が、雑音の直接音および72、73の反射音のパワーは大き
く低下している。そしてその結果、雑音のパワーの時間
的変動が小さくなることがわかる。This effect will be described with reference to FIGS. 6 (a) and 6 (b).
FIG. 6 (a) shows a pulsed noise when a sound is received by an omnidirectional microphone, and FIG. 6 (b) shows a pulsed noise when a sound is received by an adaptive array. In FIG. 6 (a), 71
Is noise directly received from the noise source, and 72, 73, and 74 are noises that have been reflected once or more than once on a wall or floor. Reflected sounds 72, 73, 74 compared to the energy of direct sound 71
Decay exponentially with time. Assuming that the number of microphone elements constituting the array is 4, this adaptive array forms three blind spots in the direction of the noise source and the directions of the reflected sounds of 72 and 73. Accordingly, although the power of the reflected sound of the noise indicated by 74 in the adaptive array output shown in FIG. 6 (b) is not much different from that of the sound received by the omnidirectional microphone, the direct sound of the noise and the reflected light of the noise 72 and 73 are not significant. The power of the sound has dropped significantly. As a result, it can be seen that the temporal fluctuation of the noise power is reduced.

先に述べたように、音声区間の誤検出の大きな要因
は、雑音のパワーの大きな時間的変動である。この時間
的変動に対処するために２つの信号のパワー差を利用し
た音声区間検出を行うのであるが、種々のSN比の変動要
因を完全に除去することは不可能であるため、誤検出を
100％回避することはできない。従って、本発明におい
て用いられる雑音パワーの時間的変動を小さくする適応
形アレーの特徴は、音声区間の誤検出をより少なくする
ために大変効果を発揮する。As described above, a major factor of erroneous detection of a speech section is a large temporal variation in noise power. To cope with this temporal variation, voice section detection using the power difference between the two signals is performed. However, since it is impossible to completely remove various causes of SN ratio variation, erroneous detection is performed.
100% can not be avoided. Therefore, the feature of the adaptive array used in the present invention, which reduces the temporal fluctuation of the noise power, is very effective in reducing the erroneous detection of the voice section.

第１図における本発明の構成例における第２の受音器
42としては、マイクロホンアレー51を構成するマイクロ
ホン素子のうちの一つを用いるのが最も簡便な方法であ
る。この例は、後述する第７図に示される。FIG. 1 shows a second sound receiver in the configuration example of the present invention.
As the 42, the simplest method is to use one of the microphone elements constituting the microphone array 51. This example is shown in FIG. 7 described later.

また、第２の受音器は、第10図に示すように、第１の
受音器42のマイクロホンアレー51のマイクロホンの出力
のいくつかを合成器52Aに入力し、出力を得ることによ
り、第２の信号x2を得ることも可能である。As shown in FIG. 10, the second sound receiver inputs some of the microphone outputs of the microphone array 51 of the first sound receiver 42 to the synthesizer 52A, and obtains the output. It is also possible to obtain a second signal x2.

指向性制御機能を持つマイクロホンアレーシステムの
他の例としては、米国特許第791,418号に示されている
ような受音方式がある。この方式では、到来方向の明確
な音声信号を保存し、周囲一様から到来する雑音を低減
するような信号処理がなされている。この方式が良好に
動作するためには、発声者と雑音源の位置が一致してい
ないという条件（マイクロホンからみた方向は同一でも
よい）が必要であり、所望の位置にある音源からの音の
みを抽出するという意味から指向性制御の一種と見なせ
る。Another example of a microphone array system having a directivity control function is a sound receiving system as shown in US Pat. No. 791,418. In this method, signal processing is performed such that an audio signal having a clear arrival direction is stored and noise arriving from a uniform surrounding area is reduced. In order for this method to work well, the condition that the position of the speaker and the position of the noise source do not match (the direction seen from the microphone may be the same) is necessary, and only the sound from the sound source at the desired position is required. Can be regarded as a kind of directivity control in the sense of extracting

第７図は、第１図に示される本発明の第一の実施例を
より具体的に説明する図である。同図において、51はマ
イクロホンアレー、52は指向特性制御部、43は第一の短
時間パワー計算部、44は第二の短時間パワー計算部、45
はパワー差に基づく音声区間検出部であることは、これ
までの実施例と同様である。また、81は指向特性制御部
52の出力側に接続されて信号x1を受けかつ出力をパワー
演計算43に送出する第一の増幅器、82はマイクロホン42
（この例ではマイクロホンアレー51を構成するマイクロ
ホン素子のひとつを使用）に接続されて信号x2を受けか
つ出力をパワー計算部44に送出する第二の増幅器、83は
パワー計算部43、44の出力p1,p2を受ける差分器、84は
パワー計算部43の出力p1を受けかつ音声区間の一部をな
しているという可能性のある短時間区間パワーに基づく
判定部、85は差分器83の出力を受けるパワーに基づく判
定部、86は短時間パワーに基づく判定部84の出力s1とパ
ワーに基づく判定部85の出力s2とを受ける音声区間候補
検定部あるいは音声区間決定部である。FIG. 7 is a diagram for more specifically explaining the first embodiment of the present invention shown in FIG. In the figure, 51 is a microphone array, 52 is a directivity control unit, 43 is a first short-time power calculation unit, 44 is a second short-time power calculation unit, 45
Is a voice section detection unit based on the power difference, as in the previous embodiments. 81 is a directivity control unit
A first amplifier 82 connected to the output of 52 for receiving the signal x1 and sending the output to the power calculator 43, 82 is a microphone 42
(In this example, one of the microphone elements constituting the microphone array 51 is used). The second amplifier 83 receives the signal x2 and sends the output to the power calculator 44. 83 is the output of the power calculators 43 and 44. A differentiator receiving p1 and p2, 84 is a determining unit that receives the output p1 of the power calculator 43 and is based on short-term section power that may form part of a voice section, and 85 is an output of the differentiator 83 A judgment unit 86 based on the received power and a speech section candidate verification unit or a speech section determination unit 86 that receives the output s1 of the judgment unit 84 based on the short-time power and the output s2 of the judgment unit 85 based on the power.

この方法を実行する手順は以下の通りである。まず、
雑音の重畳した音声はマイクロホンアレー51により受音
される。このマイクロホンアレー51の出力信号は指向性
制御部52に入力され、第１の信号x1を発生する。一方、
マイクロホンアレー51を構成する１つのマイクロホン素
子の出力をx2とする。この時、指向性制御部52による指
向性制御の結果、x1におけるSN比はx2におけるSN比より
大きいものとなっている。The procedure for performing this method is as follows. First,
The sound on which the noise is superimposed is received by the microphone array 51. The output signal of the microphone array 51 is input to the directivity control unit 52, and generates a first signal x1. on the other hand,
The output of one microphone element constituting the microphone array 51 is x2. At this time, as a result of the directivity control by the directivity control unit 52, the SN ratio at x1 is larger than the SN ratio at x2.

次に増幅器81、82を用いて信号x1およびx2に含まれる
音声のパワーが等しくなるように信号のレベルを補正す
る。この操作は必須なものではないが、この操作を行っ
ておくと、後の説明が簡単化される。次に、短時間パワ
ー計算部43、44において、それぞれ、x1およびx2の短時
間パワーP1およびP2を計算し出力する。この短時間パワ
ーP1およびP2は対数値（dB）または真数値で表されてい
るものとする。Next, the levels of the signals are corrected using the amplifiers 81 and 82 so that the powers of the voices included in the signals x1 and x2 become equal. This operation is not essential, but if it is performed, the following description will be simplified. Next, the short-time power calculation units 43 and 44 calculate and output short-time powers P1 and P2 of x1 and x2, respectively. It is assumed that the short-time powers P1 and P2 are represented by logarithmic values (dB) or exact values.

次にSN比の高い信号のパワーP1をパワーに基づく判定
部84に入力する。このパワーに基づく判定部84において
は、P1の値があらかじめ定められた閾値Thより大きい場
合には、該当する短時間区間が音声区の一部である可能
性を示すために出力S1として"1"を出力し、そうでない
場合には"0"を出力する。Next, the power P1 of the signal having the high SN ratio is input to the power-based determination unit 84. When the value of P1 is larger than the predetermined threshold Th, the determination unit 84 based on this power outputs “1” as the output S1 to indicate the possibility that the corresponding short time section is a part of the voice section. "" Is output, otherwise "0" is output.

次に、差分器83においてP1とP2の差分ＰD（ＰD＝P2−
P1）を演算し、この差分ＰDをパワー差に基づく判定部8
5に入力する。このパワー差に基づく判定部85において
は、ＰDの値があらかじめ定められた閾値Ｐthより小さ
い場合には、出力S2として"1"を出力し、そうでない場
合には"0"を出力する。Next, the difference PD between P1 and P2 (PD = P2−
P1) and calculates the difference PD based on the power difference
Enter 5 The determination unit 85 based on this power difference outputs "1" as the output S2 when the value of PD is smaller than a predetermined threshold value Pth, and outputs "0" otherwise.

最後に、上記パワーに基づく判定部84の出力S1とパワ
ー差に基づく判定部85の出力S2は音声区間決定部86に入
力される。音声区間決定部86では、S1およびS2の値がと
もに"1"である時、候補となった短時間区間は正しい音
声区間の一部をなすものと判定し、それ以外の場合には
雑音区間と判定した結果を出力する。Finally, the output S1 of the determination unit 84 based on the power and the output S2 of the determination unit 85 based on the power difference are input to the voice section determination unit 86. When the values of S1 and S2 are both “1”, the voice section determination unit 86 determines that the short-time section that is a candidate forms a part of a correct voice section, and otherwise determines that the short-time section is a noise section. Is output.

次に、上記パワー差に基づく音声区間検出部45の動作
を第８図（ａ）、（ｂ）、（ｃ）を用いて説明する。第
８図（ａ）は、第１の受音器の出力におけるパワーP1の
時間的変化を表し、第８図（ｂ）は第２の受音器の出力
におけるパワーP2の時間的変化を表し、第８図（ｃ）は
P2とP1の差ＰD（ＰD＝P2−P1）を表している。それぞれ
の図において、縦軸は信号の短時間パワーを、横軸は時
刻を表している。また、11は定常雑音、121、122は非定
常雑音、13は音声を前述した例の説明と同様に表してい
る。Next, the operation of the voice section detection section 45 based on the power difference will be described with reference to FIGS. 8 (a), (b) and (c). FIG. 8A shows a temporal change of the power P1 at the output of the first sound receiver, and FIG. 8B shows a temporal change of the power P2 at the output of the second sound receiver. And FIG. 8 (c)
The difference PD between P2 and P1 (PD = P2-P1) is shown. In each figure, the vertical axis represents the short-time power of the signal, and the horizontal axis represents time. Reference numeral 11 denotes stationary noise, reference numerals 121 and 122 denote non-stationary noise, and reference numeral 13 denotes voice in the same manner as described in the above example.

P1およびP2に含まれる音声のパワーは、等しくなるよ
うに調整されているため、P2における定常雑音のパワー
が音声のパワーより多少小さいものであれば、対数値で
パワーを表示している第８図（ａ）、（ｂ）において、
音声区間のパワーはほぼ等しいものとなる。一方、第２
の受音器の出力は第１の受音器の出力よりSN比が小さい
ため、第８図（ｂ）における雑音のパワーは、第８図
（ａ）における雑音のパワーに比べて、SN比の差に相当
する分だけ大きくなっていることが示されている。そし
て、その結果、第８図（ｃ）に示したP2とP1のパワー差
ＰDの値は、音声区間においては零となり、非音声区間
では非零の値をとる。Since the powers of the voices included in P1 and P2 are adjusted to be equal, if the power of the stationary noise in P2 is slightly smaller than the power of the voice, the power is indicated by a logarithmic value. In the figures (a) and (b),
The power in the voice section is almost equal. On the other hand, the second
Since the output of the receiver of FIG. 7 has a smaller SN ratio than the output of the first receiver, the noise power in FIG. 8B is smaller than the noise power in FIG. 8A. It is shown that the distance is increased by an amount corresponding to the difference between. As a result, the value of the power difference PD between P2 and P1 shown in FIG. 8 (c) becomes zero in the voice section and takes a non-zero value in the non-voice section.

しかし、現実の環境下では、前述したように種々のSN
比の差の変動要因が存在するため、指向性制御機能を持
つマイクロホンアレーシステムを利用して変動要因の軽
減を図った本発明においても、ＰDの値はこのような理
想的な値をとるとは限らない。例えば、予想を上回る範
囲の話者の移動は音声区間であってもＰDの値を零より
大きな値とするし、また、音声と同一方向から到来する
雑音（例えば、発声者の舌うちや、発声者が紙をめくる
音等）に対しては、それが比較的パワーの小さなもので
あったとしても、ＰDの値はその雑音区間においては零
となってしまう。However, in a real environment, as described above, various SN
Since there is a variation factor of the ratio difference, even in the present invention in which the variation factor is reduced by using a microphone array system having a directivity control function, the PD value assumes such an ideal value. Not necessarily. For example, if the movement of the speaker exceeds the expected range, the value of PD is set to a value larger than zero even in the voice section, and noise coming from the same direction as the voice (for example, the tongue of the speaker, Regarding the sound of the speaker turning over the paper), the value of PD is zero in the noise section even if the power is relatively small.

このような点を考慮して、本発明では、まず、パワー
に基づく判定部84の動作として、第８図（ａ）に示すよ
うに、閾値Thより小さい短時間区間は非音声区間と判定
してしまう。その結果、例えば、記号122で示した雑音
が音声と同一方向から到来する雑音であって、その雑音
区間においてＰDが小さなものであったとしてもこの雑
音区間を音声区間と誤検出することはなく、有効性の高
い音声区間検出が実現されることがわかる。In consideration of such points, in the present invention, first, as an operation of the determination unit 84 based on the power, as shown in FIG. Would. As a result, for example, even if the noise indicated by the symbol 122 is noise arriving from the same direction as the voice and the PD is small in the noise section, this noise section is not erroneously detected as the voice section. It can be seen that highly effective voice section detection is realized.

第７図に示される音声区間決定部86は、第１図に示さ
れるように、パワーに基づく判定部84からの出力s1をパ
ワー差に基づく判定部85からの出力s2がともに"1"であ
るときに、その短時間区間を音声区間と判定する音声区
間候補検定部86aの他にこの検定部が音声区間と判した
時間区間が音声の最少継続区間の予測値を越えて継続し
た場合のみ、この時間区間を音声区間と判定する区間検
定部86bを設けるようにしても良い。As shown in FIG. 1, the voice section determination unit 86 shown in FIG. 7 sets the output s1 from the power-based determination unit 84 to “1” for both the output s2 from the power difference-based determination unit 85. At one time, in addition to the voice section candidate testing section 86a that determines the short-time section as a voice section, only when the time section determined by this testing section to be a voice section continues beyond the predicted value of the minimum duration section of voice Alternatively, a section test unit 86b for determining this time section as a voice section may be provided.

本発明の有効性を確認するために、以下の実験を行っ
た。The following experiments were performed to confirm the effectiveness of the present invention.

（実験条件）実験は残響時間が0.4秒の室内において行った。雑音
としてはスピーカから妨害音声（ラジオのニュース）を
発声させた。所望音声としては単語音声（都市名）を用
い、異なった妨害音声下で発声した100単語を収集し
た。発声者と雑音源の位置は受音器からみて45度離れた
位置に設定した。受音器１としては、適応形アレーの一
つである、AMNOR受音装置（参考文献:Y.Kaneda and J.O
hga "Adaptive Microphone−array System for Noise R
eduction",IEEE Trans.on Acoust.,Speech,Signal Proc
essing,vol.ASSP−34,PP.1391−1400,Dec.1986）を用い
た。AMNOR受音装置は、複数のマイクロホン素子より構
成されるマイクロホンアレーとディジタルフィルタを組
み合わせて実現され、単一のマイクロホン素子に比べ
て、10〜16dB程度の高SN比受音が可能である。また、受
音器２としては、前記のマイクロホンアレーの構成要素
である１つのマイクロホン素子を用いた。短時間パワー
の算出は窓長30msで10ms毎に行った。(Experiment conditions) The experiment was performed in a room where the reverberation time was 0.4 seconds. As noise, a disturbing sound (radio news) was uttered from a speaker. Word sounds (city names) were used as desired sounds, and 100 words uttered under different disturbing sounds were collected. The position of the speaker and the noise source were set 45 degrees apart from the sound receiver. As the sound receiving device 1, an AMNOR sound receiving device (reference: Y. Kaneda and JO), which is one of adaptive arrays
hga "Adaptive Microphone-array System for Noise R
eduction ", IEEE Trans.on Acoust., Speech, Signal Proc
essing, vol. ASSP-34, PP.1391-1400, Dec. 1986). The AMNOR sound receiving device is realized by combining a microphone array composed of a plurality of microphone elements and a digital filter, and is capable of receiving a high SN ratio of about 10 to 16 dB as compared with a single microphone element. As the sound receiver 2, one microphone element which is a component of the microphone array was used. The calculation of the short-time power was performed every 10 ms with a window length of 30 ms.

パワーに基づく判定部84における閾値Thは、各発声を
一定の長さ（１秒）で取り込み、その中での最大短時間
パワーと最小短時間パワーの差ＰMMを求め、Th＝ＰMM×
0.5、と定めた。また、ＰDの閾値Ｐthは8dBと設定し
た。The threshold value Th in the power-based determination unit 84 is obtained by taking each utterance with a fixed length (1 second), finding the difference PMM between the maximum short-time power and the minimum short-time power in the utterance, Th = PMM ×
0.5. Further, the threshold value Pth of the PD was set to 8 dB.

なお、音声区間の正解としては、無雑音時の音声に対
して第１の従来法（パワーに基づく判定のみを用いる方
法）を適用して得られた区間を用いた。Note that, as the correct answer of the voice section, a section obtained by applying the first conventional method (a method using only the power-based determination) to the voice without noise was used.

（実験結果）以上の条件で、受音点での音声のSN比を、受音器２の
出力において−５dBとなるように設定して、単語区間の
検出実験を行った。(Experimental Results) Under the above conditions, a word section detection experiment was performed by setting the S / N ratio of the sound at the sound receiving point to be −5 dB at the output of the sound receiver 2.

第９図に実験結果の一例を示す。第９図（ａ）は雑音
が無い場合の音声パワーと音声区間の正解を示す。第９
図（ｂ）は妨害音声が付加された時の第２の受音器の出
力のパワーP2を示している。第９図（ｃ）は、妨害音声
が付加された時の第１の受音器（AMNOR受音装置）の出
力のパワーP1、および選択された音声区間候補を示して
いる。ハッチで示した部分が、誤って検出された音声区
間を示している。第９図（ｂ）と（ｃ）を比べた時、
（ｂ）図に△印で示した雑音のパワーの時間的変動が、
適応形アレーの出力である（ｃ）図において小さなもの
になっていることがわかる。すなわち、パワーの時間的
変化の鋭いピークが平坦なものになっている。FIG. 9 shows an example of the experimental result. FIG. 9 (a) shows the speech power and the correct answer in the speech section when there is no noise. Ninth
FIG. 6B shows the output power P2 of the second sound receiver when the disturbing sound is added. FIG. 9 (c) shows the power P1 of the output of the first sound receiver (AMNOR sound receiving device) when the disturbing sound is added, and the selected sound section candidate. The hatched portions indicate erroneously detected speech sections. When comparing FIG. 9 (b) and (c),
(B) The time variation of the noise power indicated by the symbol in the figure is
It can be seen that the output of the adaptive array is small in FIG. That is, the sharp peak of the temporal change of the power is flat.

第９図（ｄ）は本発明の手法を適用した結果、単語区
間と判定した結果を矢印で表示している。なお、第９図
（ｃ），（ｄ）において検出された音声区間にはされま
れる200ms以内の非音声区間は、単語区間の一部と見な
した。ハッチで示した部分は誤検出（音声区間を雑音区
間と判定した）した区間である。この図より、本発明の
方法は、ほぼ良好に動作を行っていることが確認でき
る。FIG. 9D shows the result of applying the method of the present invention, which is determined as a word section, by arrows. The non-speech section within 200 ms included in the speech section detected in FIGS. 9C and 9D is regarded as a part of the word section. The portion indicated by hatching is a section where erroneous detection (a speech section is determined as a noise section). From this figure, it can be confirmed that the method of the present invention operates almost satisfactorily.

実験結果を定量的に評価するために、単語区間の始端
および終端における誤差が50ms以内で検出できた場合を
正解とみなし、その正解率を求めた。SN比の高い、AMNO
Rの出力に対して、現在の音声認識装置においても、最
も多く利用されている第１の従来法を適用した場合、正
解率は43％であった。これに対し、本発明方式では、96
％の検出結果を得、その時の始・終端の平均検出誤差は
約20msであった。この結果より、本音声区間検出法の有
効性が確認された。In order to quantitatively evaluate the experimental results, the case where errors at the beginning and end of a word section were detected within 50 ms was regarded as correct, and the correct rate was calculated. AMNO with high SN ratio
When the first conventional method, which is most frequently used in current speech recognition devices, is applied to the output of R, the correct answer rate is 43%. In contrast, in the method of the present invention, 96
% Detection result, and the average detection error at the start and end at that time was about 20 ms. From these results, the effectiveness of the present voice section detection method was confirmed.

また、例えば第２図（ａ）に示されるように、第１の
受音器として単一指向性マイクロホンを用いた場合に
は、発声者とマイクロホンとを結ぶ直線し対して、マイ
クロホンを中心に実質的に発声者方向90度以内の範囲に
雑音源が存在した場合には、単語区間の正答率は10％程
度であり、本発明が高精度な音響信号検出方式であるこ
とが確認された。なお、本発明では発声者とマイクロホ
ンとを結ぶ直線に対して±30°の範囲を除いて前述した
±96％の検定結果が得られている。When a unidirectional microphone is used as the first sound receiver, for example, as shown in FIG. 2 (a), a straight line connecting the speaker and the microphone is placed around the microphone. When the noise source substantially exists within the range of 90 degrees in the direction of the speaker, the correct answer rate of the word section is about 10%, confirming that the present invention is a high-accuracy acoustic signal detection method. . In the present invention, the above-mentioned ± 96% test result is obtained except for the range of ± 30 ° with respect to the straight line connecting the speaker and the microphone.

若干の性能劣化が許容できる応用には、いわゆる超指
向性受音器と選択フィルタより構成される受音器も、本
発明の第１の受音器として適用が可能である。第12図に
その構成例を示す。第12図において51はマイクロホンア
レー、91は超指向性を実現するための加算器、92は処理
フィルタである。前述したように、超指向性受音器を用
いた場合には低周波域、高周波域においてSN比の変動が
大きくなるため、この処理フィルタは発声者の移動が予
想される範囲において感度が高く、その範囲外では感度
の低い帯域のみを抽出することにより、この問題点の改
善を行うものである。この方式の問題点はSN比の変動の
少ない周波数帯域が必ずしも音声をエネルギーの大きな
帯域とは一致しないため、第１の受音器の出力のSN比が
低下し、音声区間候補における誤選択が増加する点にあ
る。一方、この方式の利点は、系構成が単純であるとい
う点にある。For applications where slight performance degradation can be tolerated, a so-called super-directional sound receiver and a sound receiver including a selection filter can also be applied as the first sound receiver of the present invention. FIG. 12 shows an example of the configuration. In FIG. 12, reference numeral 51 denotes a microphone array; 91, an adder for realizing superdirectivity; and 92, a processing filter. As described above, when a super-directional sound receiver is used, the fluctuation of the S / N ratio becomes large in the low frequency range and the high frequency range, so that this processing filter has high sensitivity in a range where the movement of the speaker is expected. This problem is improved by extracting only the low-sensitivity band outside the range. The problem with this method is that the frequency band where the SN ratio fluctuation is small does not necessarily match the voice with the band with high energy, so that the SN ratio of the output of the first sound receiver decreases, and erroneous selection in the voice section candidate is reduced. The point is to increase. On the other hand, the advantage of this method is that the system configuration is simple.

本発明においては、音声信号固有の性質を全く利用し
ていない。しかし、音声区間検出を行うためには、音声
信号の性質を利用した判定法を本発明と組み合わせて使
用することは大変有効である。In the present invention, the characteristic inherent to the audio signal is not used at all. However, in order to perform voice section detection, it is very effective to use a determination method utilizing the properties of voice signals in combination with the present invention.

実際、第１の従来法はそれ単独で使用されることはな
く、音声信号の性質を利用した判定法と組み合わせて使
用するのが通常である。たとえば、音声信号の最小継続
時間の予測値Tcを利用して、Tcより短い音声区間の候補
は雑音と判定する方法が知られている。この判定法を組
み合わせて、パルス性雑音の影響を除去することは、音
声区間検出において大変有効な方式である。また、音声
信号の周期性を利用して、信号が非周期性である区間は
非音声であると判定する方法など、その他にも数多くの
判定方が知られている。これら従来の方法は、本発明で
音声区間と判定した区間を入力として、その区間の再判
定を行う、または、本発明を含めた複数の判定を行った
結果の多数決により音声区間の最終決定を行うなどの方
法により、簡単に本発明と組み合わせて使用することが
できる。In fact, the first conventional method is not used alone, but is usually used in combination with a judgment method utilizing the properties of the audio signal. For example, a method is known in which a candidate for a voice section shorter than Tc is determined to be noise by using a predicted value Tc of the minimum duration of a voice signal. Eliminating the influence of pulse noise by combining this determination method is a very effective method in voice section detection. In addition, many other determination methods are known, such as a method of determining that a section in which a signal is non-periodic by using the periodicity of a voice signal is non-voice. In these conventional methods, the section determined as a voice section in the present invention is input, and the section is re-determined, or the final decision of the voice section is made by majority decision of the results of a plurality of determinations including the present invention. It can be easily used in combination with the present invention by a method such as performing.

このように、本発明は、従来知られている多くの音声
区間検出法と組み合わせることが可能であり、その結
果、使用目的に応じて、検出性能の大きな向上を実現す
ることも可能である。As described above, the present invention can be combined with many conventionally known speech section detection methods, and as a result, it is possible to realize a great improvement in detection performance according to the purpose of use.

さて、本発明の第一の応用分野としては、以上で説明
してきたように音声認識装置への適用がある。第二の応
用分野としては、音響エコーキャンセラがある。音響エ
コーキャンセラとは、例えば、拡声電話系などにおい
て、受話スピーカからの音が送話マイクロホンに回り込
んで受音され、その結果ハウリング等の問題を生じる事
を防ぐための技術である。音響エコーキャンセラの原理
は、受話スピーカから送話マイクロホンまでの音響伝達
特性を推定し、その推定結果に基づいて送話マイクロホ
ンで受音された信号から受話スピーカからの音の成分を
差し引くというものである。この受話スピーカから送話
マイクロホンまでの伝達特性は時刻と共に変化するた
め、推定を継続的に行う必要があるが、その推定を行う
時には送話者は発声していないという条件（さもない
と、大きな推定誤差が発生する）が必要である。しか
し、送話者の発声の有無の判定は必ずしもうまくは行わ
れず、そのことが、この技術の現在の課題の一つとなっ
ている。As a first application field of the present invention, there is an application to a speech recognition device as described above. A second application field is an acoustic echo canceller. The acoustic echo canceller is a technique for preventing, for example, in a loudspeaker system or the like, a sound from a receiving speaker wrapping around a transmitting microphone and being received, thereby causing a problem such as howling. The principle of the acoustic echo canceller is to estimate the sound transfer characteristics from the receiving speaker to the transmitting microphone, and subtract the sound component from the receiving speaker from the signal received by the transmitting microphone based on the estimation result. is there. Since the transfer characteristic from the receiving speaker to the transmitting microphone changes with time, it is necessary to continuously perform the estimation. However, when performing the estimation, the condition that the sender is not uttering (otherwise, a large Estimation error occurs). However, the determination of the presence or absence of the speaker's utterance is not always successful, which is one of the current problems of this technology.

この問題に対して、送話者の音声を目的音声、受話ス
ピーカからの音声を不要音声と考えて本発明を適用し、
ある時間区間に目的音声が存在すると判定した時刻には
送話者が発声しているものとみなして、上記伝達特性の
推定動作を停止することを行えば、上記課題を解決し
た、高性能な音響エコーキャンセラの実現が可能とな
る。To solve this problem, the present invention is applied by considering the voice of the sender as the target voice and the voice from the receiving speaker as the unnecessary voice,
At a time when it is determined that the target voice is present in a certain time section, it is considered that the speaker is uttering, and if the operation of estimating the transfer characteristic is stopped, a high-performance It becomes possible to realize an acoustic echo canceller.

第三の応用分野としては、音声蓄積技術への応用があ
る。例えば、大量の連続発声音声をディジタル化し、磁
気ディスクなどに記録しようとする場合、音声符号化に
よる情報圧縮技術も重要であるが、非音声区間を検出し
てその区間を切り捨てたり、またはその区間を特に低い
情報量で記録することも大変重要な技術である。本発明
はそのような技術における非音声区間の検出に適用可能
である。A third field of application is in voice storage technology. For example, when digitizing a large amount of continuous uttered voice and recording it on a magnetic disk or the like, information compression technology by voice coding is also important.However, a non-voice section is detected and the section is discarded or the section is cut off. It is also a very important technology to record the information with a particularly low information amount. The present invention is applicable to detection of a non-voice section in such a technique.

さらに、本発明方式は音声信号固有の性質を利用して
いないため、検出対象とする音としては、音声以外の任
意の音（例えば音楽、機械音、衝撃音など）を選ぶこと
が可能である。そして、その結果、本発明方式は各種監
視装置、計測装置、などを始めとした、様々な応用形態
が考えられる。Furthermore, since the method of the present invention does not utilize the properties inherent in audio signals, any sound other than audio (for example, music, mechanical sound, impact sound, etc.) can be selected as the sound to be detected. . As a result, the present invention can be applied to various application forms including various kinds of monitoring devices and measuring devices.

［発明の効果］以上説明したように、本発明の方法は同一の場所に設
置された第一の受音器（指向性制御機能を持つマイクロ
ホンアレーシステム）および第２の受音器によって受音
された信号の間の短時間パワーの差を利用して所望の信
号の存在を判定するため、従来のこの種方式では不可能
であった、非定常雑音環境下における所望音声区間の検
出を可能とするものである。[Effects of the Invention] As described above, the method of the present invention provides a method of receiving a sound by a first sound receiver (a microphone array system having a directivity control function) and a second sound receiver installed in the same place. The presence of a desired signal is determined by using the short-time power difference between the extracted signals, enabling the detection of a desired voice section in a non-stationary noise environment, which was not possible with this type of conventional system It is assumed that.

[Brief description of the drawings]

第１図は本発明による音響信号検出方法の実施例を説明
するためのブロック図、第２図は単一指向性マイクロホ
ンと無指向性マイクロホンとを用いた場合の問題点を説
明するための図、第３図は超指向性受音器を用いた場合
の問題点を説明するための図、第４図は第１図の第１の
受音器の具体例を示すブロック図、第５図は適応形アレ
ーの指向特性を示す図、第６図は無指向性マイクロホン
と適応形アレーを用いたときのパルス性雑音の受音信号
波形を示す波形図、第７図は第１図に示される実施例を
より具体的に示すブロック図、第８図は第７図に示され
る音声区間検出部の動作を説明するためのグラフ、第９
図は本発明の有効性を確かめた実験結果を示す図、第10
図から第12図は本発明の他の実施例を示すブロック図、
第13図は従来の音声区間検出法の第１の例を示すグラ
フ、第14図は従来の音声区間検出法の第２の例を説明す
るためのマイクロホン設置例を示す図、第15図は第２の
従来法の理想的動作を説明するためのグラフ、第16図は
マイクロホンと雑音源との位置関係を示すグラフ、第17
図は第２の従来法の問題を説明するためのグラフ、第18
図はマイクロホンと雑音源との位置関係を示す図、第19
図は従来の音声区間検出法の第３の例を示すブロック
図、第20図は第19図に示される第３の例の問題点を説明
するためのグラフである。 41,41……受音器、43,44……短時間パワー計算部、45…
…音声区間検出部、51……マイクロホンアレー、52……
指向性制御部、84,85……判定部、86……音声区間決定
部。FIG. 1 is a block diagram for explaining an embodiment of an audio signal detecting method according to the present invention, and FIG. 2 is a diagram for explaining a problem when a unidirectional microphone and an omnidirectional microphone are used. FIG. 3 is a diagram for explaining a problem when a super-directional sound receiver is used, FIG. 4 is a block diagram showing a specific example of the first sound receiver of FIG. 1, and FIG. Is a diagram showing the directional characteristics of the adaptive array, FIG. 6 is a waveform diagram showing a sound signal waveform of pulsed noise when the omnidirectional microphone and the adaptive array are used, and FIG. 7 is a diagram shown in FIG. FIG. 8 is a block diagram more specifically showing an embodiment of the present invention, FIG. 8 is a graph for explaining the operation of the voice section detecting section shown in FIG.
The figure shows the experimental results confirming the effectiveness of the present invention, FIG.
FIG. 12 is a block diagram showing another embodiment of the present invention,
FIG. 13 is a graph showing a first example of a conventional voice section detection method, FIG. 14 is a diagram showing a microphone installation example for explaining a second example of the conventional voice section detection method, and FIG. FIG. 16 is a graph for explaining an ideal operation of the second conventional method, FIG. 16 is a graph showing a positional relationship between a microphone and a noise source, and FIG.
The figure is a graph for explaining the problem of the second conventional method.
The figure shows the positional relationship between the microphone and the noise source.
FIG. 2 is a block diagram showing a third example of the conventional voice section detection method, and FIG. 20 is a graph for explaining a problem of the third example shown in FIG. 41,41 …… Sound receiver, 43,44 …… Short-time power calculator, 45…
... Sound section detector, 51 ... Microphone array, 52 ...
Directivity control section, 84, 85... Determination section, 86... Voice section determination section.

Claims

(57) [Claims]

A first and a second sound receiver which are provided at substantially the same position and have different power ratios (SN ratios) of a target signal and a noise, and have different directional characteristics; When the difference or ratio of the power of the signals transmitted from these sound receivers in a certain time interval is within a predetermined range, it is determined that the target signal has been received in this time interval, One sound receiver comprises an adaptive microphone array system including a microphone array composed of a plurality of microphone elements and a directional characteristic control circuit arranged at a subsequent stage to control the directional characteristic according to the noise position. A method for detecting an acoustic signal, comprising:

2. A signal output from a sound receiver having a high SN ratio, wherein the difference or ratio between the powers of the two signals in a certain time interval is within a predetermined range. If the power in a certain time interval is within a predetermined range, it is determined that the target signal has been received in this time interval.

3. The acoustic signal detection method according to claim 1, wherein said second sound receiver is also constituted by a microphone array system.

4. The apparatus according to claim 1, wherein the time section in which it is determined that the target signal has been received continues beyond the predicted value of the minimum duration of the voice, and the target signal is received in this time section. A sound signal detection method characterized by determining.

5. The device according to claim 1, wherein the second sound receiver is
An acoustic signal detection method comprising using one microphone element which is a component of a microphone array constituting the first sound receiver.

6. The apparatus according to claim 4, wherein the second sound receiver comprises:
An acoustic signal detection method, characterized in that some of the microphone elements constituting the first sound receiver are shared, and further comprising means for synthesizing the outputs of the several microphone elements.

7. A first and a second sound receiver provided at substantially the same position and having different power ratios (SN ratios) of a target signal and a noise and having different directional characteristics are used, When the difference or ratio of the power of the signals transmitted from these sound receivers in a certain time interval is within a predetermined range, it is determined that the target signal has been received in this time interval, One sound receiver includes a directional microphone array in which a plurality of microphones are arranged, a synthesizer that receives the output of each microphone to synthesize superdirectivity, and receives a predetermined band component by receiving the output of the synthesizer. An acoustic signal detection method, comprising: a band selection filter to be passed.