JP3384540B2

JP3384540B2 - Receiving method, apparatus and recording medium

Info

Publication number: JP3384540B2
Application number: JP26465297A
Authority: JP
Inventors: 茂明青木; 真理子青木; 学岡本; 弘行松井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-03-13
Filing date: 1997-09-29
Publication date: 2003-03-10
Anticipated expiration: 2017-09-29
Also published as: JPH10313498A

Abstract

PROBLEM TO BE SOLVED: To sufficiently suppress occurrence of howling with a comparatively simple configuration and insignificant sound quality deterioration. SOLUTION: A microphone 1 is placed to a taker side and a microphone 2 is placed as a speaker driven by a received signal from a subscriber, where output channel signals of the microphones 1, 2 are split into a plurality of bands so that a major component of one frequency band results from one sound source signal (4). An inter-channel level difference/arrival time difference for each identical frequency band are detected and compared with respective threshold levels for each band, and it is determined whether the frequency band is a voice signal component of the talker or other signal component (601). Then the frequency band component of the output of the microphone 1 is selected only for the frequency band discriminated as the voice signal (602) and they are synthesized and the result is transmitted to the subscriber (7A).

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は対地からの受信信
号を拡声器などで音響信号として放射し、また発話者の
発声音声信号をマイクロホンで収音して対地へ送信する
際に、受信信号を変換した音響信号が上記マイクロホン
に収音されて、ハウリングが発生するのを抑圧する、つ
まり、受信信号の音響信号が対地へ送信する信号に回り
込むのを抑圧する受話方法、その装置、およびそのプロ
グラム記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention radiates a received signal from the ground as an acoustic signal with a loudspeaker or the like, and collects a voice signal of a speaker by a microphone and transmits the received signal to the ground. It converted acoustic signal is picked up in the microphone, for suppressing the howling occurs, that is, receiving method acoustic signal of the received signal is suppressed from flowing into the signal to be transmitted to the ground, the device, and the program Recording medium

【０００２】[0002]

【従来の技術】不要な回り込み音を抑圧し、ハウリング
を抑圧できる従来法は、大きく分けて３種類あった。第
１の手法は、ハウリングの生ずる周波数を検知し、その
周波数のノッチフィルタを送信信号（話者音声信号）ま
たは受信信号に導入する。この手法はノッチフィルタを
入れた帯域成分が送信信号から欠落することから音質の
劣化が生ずる。2. Description of the Related Art There are roughly three types of conventional methods that can suppress unwanted wraparound sound and suppress howling. The first method detects a frequency at which howling occurs and introduces a notch filter of that frequency into a transmission signal (speaker voice signal) or a reception signal. In this method, the sound quality is deteriorated because the band component with the notch filter is missing from the transmission signal.

【０００３】第２の手法は、受信して再生した信号と、
収音して送信する信号の周波数特性を異なるように周波
数変調を掛けて、ハウリングを抑圧する方式である。受
信して再生した信号は電気信号として確実に把握でき
る。一方、収音した信号は収音して送信すべき信号と受
信して再生した信号が混在し、収音して送信すべき信号
を確実に把握することはできない。したがって、ハウリ
ングを抑圧するために必要でない収音して送信すべき信
号にまで、変調を掛けてしまい、音質の劣化が起こる。The second method is to receive and reproduce a signal,
In this method, howling is suppressed by applying frequency modulation so that the frequency characteristics of a signal that is picked up and transmitted are different. The received and reproduced signal can be surely grasped as an electric signal. On the other hand, the picked-up signal is a mixture of a signal to be picked up and transmitted and a signal to be received and reproduced, and it is not possible to reliably grasp the signal to be picked up and sent. Therefore, even a signal to be collected and transmitted, which is not necessary for suppressing howling, is modulated, resulting in deterioration of sound quality.

【０００４】第３の手法として、受信して再生した信号
が収音して送信する信号に混入する状況を適応形フィル
タを用いて予測する方法である。混入が予測された再生
信号の成分を収音して送信する信号から、引き去ること
で、ハウリングを抑圧する。しかし、予測するための適
応形フィルタは、時事刻々変動しており、適応形フィル
タの予測は収音して送信すべき信号が無いときのみ、す
なわち再生信号のみがあるとき、可能である。第２の手
法と同様に、収音した信号は収音して送信すべき信号と
受信して再生した信号が混在した場合が多く、収音して
送信すべき信号が無いことを確実に把握することが、必
要となる。A third method is a method of predicting a situation in which a received and reproduced signal is picked up and mixed in a transmitted signal by using an adaptive filter. Howling is suppressed by subtracting the component of the reproduction signal whose mixing is predicted to be picked up from the signal to be transmitted. However, the adaptive filter for prediction varies from time to time, and the prediction of the adaptive filter is possible only when there is no signal to be picked up and transmitted, that is, when there is only a reproduction signal. Similar to the second method, in many cases, the picked-up signals are mixed with the signals to be picked up and transmitted and the signals to be received and reproduced, and it is sure that there is no signal to be picked up and transmitted. It is necessary to do so.

【０００５】[0005]

【発明が解決しようとする課題】したがって従来の技術
では、音質の劣化が少なく、ハウリングを抑圧できない
といった問題を有している。Therefore, the conventional technique has a problem that the deterioration of the sound quality is small and the howling cannot be suppressed.

【０００６】[0006]

【課題を解決するための手段】この発明の収音方法は、
互いに離して設けられた複数のマイクロホンを用い、上
記各マイクロホンの各出力チャネル信号を、帯域分割過
程で複数の周波数帯域に分割し、その各帯域には主とし
て１つの音源信号成分のみ存在するようにし、これら分
割された各出力チャネル信号の各同一帯域ごとに、上記
複数のマイクロホンの位置に起因して変化する、マイク
ロホンに到達する音響信号のパラメータ、つまりレベル
（パワー）、到達時間（位相）の値の差を、帯域別チャ
ネル間パラメータ値差として検出し、上記各帯域の帯域
別チャネル間パラメータ値差を用いて、予め設定された
しきい値にもとづき、上記帯域分割された出力チャネル
信号が発話者の音声信号成分か否かを音声信号判定過程
で判定し、この音声信号判定過程の判定にもとづき、上
記帯域分割された出力チャネル信号から、同一発話者か
ら入力され音声信号を少なくとも１つ、音声信号選択過
程で選択し、その音声信号選択過程で同一発話者からの
信号として選択された、複数の帯域信号を音声信号とし
て音声合成過程で合成し、その合成音声信号を対地へ送
信する。The sound collecting method of the present invention comprises:
By using a plurality of microphones provided apart from each other, each output channel signal of each microphone is divided into a plurality of frequency bands in a band division process, and each of the bands mainly has only one sound source signal component. , The parameters of the acoustic signal reaching the microphone, that is, the level (power) and the arrival time (phase), which change due to the positions of the plurality of microphones, for each of the same bands of the divided output channel signals. The difference in the value is detected as a parameter value difference between the band-by-band, using the parameter value difference between the band-by-band of each band, based on a preset threshold, the band-divided output channel signal Whether or not it is the voice signal component of the speaker is determined in the voice signal determination process, and based on the determination in the voice signal determination process, the above band division is performed. From the power channel signals, at least one voice signal input from the same speaker is selected in the voice signal selection process, and a plurality of band signals selected as signals from the same speaker in the voice signal selection process are voice signals. As a voice synthesis process, and the synthesized voice signal is transmitted to the ground.

【０００７】この発明の実施例によれば、対地からの受
信信号を１つの帯域には主として無視できる程度に小さ
なレベルしか存在しない帯域が存在する程度に狭い複数
の帯域に分割すると共にその帯域分割された受信信号の
レベルをそれぞれ検出し、これら各分割帯域について、
その上記検出したレベルが所定値以下であれば送信可能
帯域判定過程で送信可能帯域と判定し、音声信号選択過
程で選択された帯域信号中の送信可能と判定された帯域
だけを送信可能選択過程で選択して音声合成過程へ送
る。According to the embodiment of the present invention, the received signal from the ground is divided into a plurality of bands which are so narrow that there is a band having a negligibly small level in one band and the band division. Detected level of the received signal, and for each of these divided bands,
If the detected level is less than or equal to a predetermined value, it is determined as a transmittable band in the transmittable band determination process, and only the band determined to be transmittable in the band signal selected in the voice signal selection process is transmittable selection process. Select with and send to the voice synthesis process.

【０００８】その送信可能選択過程での選択は、上記音
声信号判定過程での判定を、送信可能と判定された帯域
のみに行うことによってもよい。この発明の他の実施例
によれば、受信信号を、複数の周波数帯域に分割し、上
記音声信号選択過程で選択された帯域と対応する、帯域
分割された受信信号成分を周波数成分除去過程で除去
し、その成分除去された残りの受信信号の帯域成分を、
時間領域の信号に再合成過程で合成し、その合成信号を
電気音響変換手段へ供給する。The selection in the transmittable selection process may be performed by performing the determination in the voice signal determination process only for the band determined to be transmittable. According to another embodiment of the present invention, the received signal is divided into a plurality of frequency bands, and the band-divided received signal component corresponding to the band selected in the voice signal selection process is processed in the frequency component removal process. The band component of the remaining received signal that has been removed
The signal in the time domain is synthesized in the resynthesis process, and the synthesized signal is supplied to the electroacoustic conversion means.

【０００９】[0009]

【発明の実施の形態】この発明の受話装置に用い基本構
成を図１に示す。図１において室２１０内に電気音響変
換手段としてスピーカ２１１が設けられ、伝送線２１２
を介して送られて来た相手話者の音声信号（受信信号）
が、スピーカ２１１で再生され、室２１０内へ音響信号
として放射される。一方室２１０内の発話者２１５が発
声した音声信号がマイクロホン１で受音され、電気信号
として伝送線２１６を通して相手話者側へ伝送される。
この場合、スピーカ２１１より放音される音声信号がマ
イクロホン１で捕捉され、相手話者側へ伝送されるとハ
ウリングが発生する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the basic configuration used in a receiver according to the present invention. In FIG. 1, a speaker 211 is provided as an electroacoustic conversion means in a room 210, and a transmission line 212 is provided.
Voice signal (received signal) of the other speaker sent via
Is reproduced by the speaker 211 and radiated into the room 210 as an acoustic signal. On the other hand, the voice signal uttered by the speaker 215 in the room 210 is received by the microphone 1 and transmitted as an electric signal to the other speaker side through the transmission line 216.
In this case, howling occurs when the voice signal emitted from the speaker 211 is captured by the microphone 1 and transmitted to the other speaker side.

【００１０】そこでこの実施例ではスピーカ２１１と発
話者２１５との配列方向とほぼ平行にマイクロホン２が
マイクロホン１とが例えば２０ｃｍ程度離して並んで設
けられ、かつマイクロホン２はスピーカ２１１側とされ
る。これらマイクロホン１，２が収音処理部２２０に接
続される。図２に収音処理部２２０の具体例を説明す
る。マイクロホン１の出力をＬチャネル信号と、マイク
ロホン２の出力をＲチャネル信号と称する。Ｌチャネル
信号とＲチャネル信号はチャネル間時間差／レベル差検
出部３と、帯域分割部４へ供給され、帯域分割部４では
それぞれ複数の周波数帯域信号に分割されて帯域別チャ
ネル間時間差／レベル差検出部５と音源判定信号選別部
６へ供給される。検出部３，５の各検出出力に応じて選
別部６において各帯域ごとに何れかのチャネル信号が発
話者の音声成分又はスピーカからの音響成分として選別
され、これら選択された帯域ごとの発話者音声成分信号
は音声信号合成部７Ａで合成されて、発話者音声信号の
みが取出される。Therefore, in this embodiment, the microphone 2 and the microphone 1 are provided side by side, for example, about 20 cm apart from each other, substantially parallel to the arrangement direction of the speaker 211 and the speaker 215, and the microphone 2 is on the speaker 211 side. These microphones 1 and 2 are connected to the sound collection processing unit 220. A specific example of the sound collection processing unit 220 will be described with reference to FIG. The output of the microphone 1 is called an L channel signal and the output of the microphone 2 is called an R channel signal. The L-channel signal and the R-channel signal are supplied to an inter-channel time difference / level difference detection unit 3 and a band division unit 4, which are each divided into a plurality of frequency band signals, and inter-channel time difference / level difference between bands. It is supplied to the detection unit 5 and the sound source determination signal selection unit 6. In accordance with each detection output of the detection units 3 and 5, the selection unit 6 selects any channel signal for each band as the voice component of the speaker or the acoustic component from the speaker, and the speaker for each selected band. The voice component signal is synthesized by the voice signal synthesizer 7A, and only the speaker voice signal is extracted.

【００１１】発話者２１５はマイクロホン２よりマイク
ロホン１に近いから、発話者音声はマイクロホン１にマ
イクロホン２より早く到達し、かつレベルが大きい、ま
たスピーカ２１１はマイクロホン１よりマイクロホン２
に近いため、スピーカ２１１からの音響信号はマイクロ
ホン１よりも早くマイクロホン２に到達し、レベルも大
きい。このようにこの発明では、音源である発話者とス
ピーカのマイクロホン１，２に対する位置に起因する両
マイクロホン１，２に到達する音響信号の変化量、この
例では両信号の到達時間差とレベル差を利用する。Since the speaker 215 is closer to the microphone 1 than the microphone 2, the speaker's voice reaches the microphone 1 earlier than the microphone 2 and has a higher level, and the speaker 211 has the microphone 2 than the microphone 1.
, The acoustic signal from the speaker 211 reaches the microphone 2 earlier than the microphone 1, and the level is high. As described above, in the present invention, the change amount of the acoustic signal reaching the microphones 1 and 2 due to the positions of the speaker as the sound source and the speaker with respect to the microphones 1 and 2, in this example, the arrival time difference and the level difference between the signals are calculated. To use.

【００１２】音声信号判定部２０１で、各帯域ごとにそ
のしきい値、例えば０と比較してレベル差と到達時間差
が０より大きいときは、その帯域の成分は発話者音声成
分と判定し、０より小さい時は、その帯域の成分はスピ
ーカ音響成分と判定する。ただし、差検出部５でマイク
ロホン１の出力信号から得たレベル、到達時間から、マ
イクロホン２の出力信号から得たレベル、到達時間を引
いた場合である。In the voice signal judging section 201, when the level difference and the arrival time difference are larger than 0 as compared with the threshold value for each band, for example, 0, it is judged that the component of the band is the speaker voice component, When it is smaller than 0, the component in that band is determined to be the speaker sound component. However, this is the case where the level and arrival time obtained from the output signal of the microphone 2 are subtracted from the level and arrival time obtained from the output signal of the microphone 1 by the difference detection unit 5.

【００１３】このようにして発話者音声と判定された帯
域についてのみ、音声信号選別部６０２でマイクロホン
１の信号の帯域成分を選択し、これら選択された帯域音
声成分を、音声信号合成部７Ａで時間領域信号、つまり
合成音声信号に変換して、伝送路２１６へ送信する。以
下に発話者音声信号をスピーカ音響信号と分離して取出
す手法の一例を具体的に説明する。以下では発話者２１
５とスピーカ２１１をそれぞれ、音源Ａ，Ｂと称し、例
えば、発話者が複数居る場合にこれらの発話者の各音声
信号を分離し、その１つのみ、又は複数を送信する場合
にも適用できるからである。Only for the band determined as the speaker's voice in this way, the voice signal selection unit 602 selects band components of the signal of the microphone 1, and the selected band voice components are selected by the voice signal synthesis unit 7A. It is converted into a time domain signal, that is, a synthesized voice signal, and transmitted to the transmission line 216. An example of a method of separating the speaker voice signal from the speaker sound signal and extracting the speaker voice signal will be specifically described below. In the following, speaker 21
5 and the speaker 211 are referred to as sound sources A and B, respectively. For example, when there are a plurality of speakers, each voice signal of these speakers is separated, and it is also applicable to the case where only one or a plurality of them are transmitted. Because.

【００１４】図３に示すように、マイクロホン１，２に
２つの音源Ａ，Ｂからの信号が取り込まれる（Ｓ０
１）。チャネル間時間差／レベル差検出部３は、Ｌチャ
ネル信号とＲチャネル信号からチャネル間時間差または
レベル差を検出する。時間差の検出に用いるパラメータ
としては、Ｌチャネル信号とＲチャネル信号との相互相
関関数を用いた場合で説明する。図４に示すようにま
ず、Ｌチャネル信号とＲチャネル信号との各サンプルＬ
（ｔ），Ｒ（ｔ）を読み込み（Ｓ０２）、これらサンプ
ル間の相互相関関数を算出する（Ｓ０３）。この算出は
両チャネル信号が同一サンプル時点についての相互相関
を求め、また一方のチャネル信号に対し他方のチャネル
信号をサンプル時点を１つだけずらした場合、２つだけ
ずらした場合・・・の各場合の相互相関をそれぞれ求め
て相互相関関数を求める。これら相互相関を多数求め、
これらをパワーで正規化したヒストグラムを作成する
（Ｓ０４）。次に、ヒストグラムの累積度数順位第一
位、第二位をそれぞれとる時点差Δα ₁，Δα₂を求め
る（Ｓ０５）。これらの時点差Δα₁，Δα₂を、次式
によりそれぞれチャネル間時間差Δτ₁，Δτ₂に変換
して出力する（Ｓ０６）。As shown in FIG.
The signals from the two sound sources A and B are captured (S0
1). The inter-channel time difference / level difference detection unit 3 uses the L channel
Time difference between channel and R channel signal or
Detect the level difference. Parameters used to detect the time difference
Is the mutual phase of the L channel signal and the R channel signal.
The case of using the function will be described. As shown in FIG.
Each sample L of the L channel signal and the R channel signal
(T) and R (t) are read (S02), and these sample
A cross-correlation function between the two modules is calculated (S03). This calculation
Cross-correlation of both channel signals at the same sample time
For one channel signal and the other channel
If the signal is staggered by one sampling time, only two
Cross-correlation in each case when shifted ...
To find the cross-correlation function. Obtain a large number of these cross-correlations,
Create a histogram that normalizes these with power
(S04). Next, the histogram cumulative frequency ranking first
Time difference Δα ₁, Δα₂Seeking
(S05). These time difference Δα₁, Δα₂Is
The time difference between channels Δτ₁, Δτ₂Conversion to
And output (S06).

【００１５】 Δτ₁＝１０００×Δα₁／Ｆ（１） Δτ₂＝１０００×Δα₂／Ｆ（２）ただしＦはサンプリング周波数であり、１０００倍にす
るのは演算の便宜上値をある程度大きくするためであ
る。時間差Δτ₁，Δτ₂は、音源Ａ，Ｂそれぞれの信
号のＬチャネル信号とＲチャネル信号のチャネル間時間
差である。Δτ ₁ = 1000 × Δα ₁ / F (1) Δτ ₂ = 1000 × Δα ₂ / F (2) where F is a sampling frequency, and the factor of 1000 is to increase the value to some extent for convenience of calculation. Is. The time differences Δτ ₁ and Δτ ₂ are the time differences between the L channel signal and the R channel signal of the signals of the sound sources A and B, respectively.

【００１６】図２、３の説明に戻って帯域分割部４はＬ
チャネル信号とＲチャネル信号をそれぞれ各周波数帯域
の信号Ｌ（ｆ１），Ｌ（ｆ２），…，（ｆｎ）と、信号
Ｒ（ｆ１），Ｒ（ｆ２），…，（ｆｎ）に分割する（Ｓ
０４）。この分割は例えば各チャネル信号をそれぞれ離
散的フーリエ変換して周波数領域信号に変換した後、各
周波数帯域に分割することにより行う。この帯域分割
は、音源Ａ，Ｂの各信号の周波数特性の差から各帯域に
おいて、一方の音源の信号成分のみが主として存在する
程度、音声信号の場合は、例えば２０Ｈｚ帯域幅で分割
する。音源Ａのパワースペクトルが例えば図５Ａに示す
ように得られ、音源Ｂのパワースペクトルが図５Ｂに示
すように得られ、この各スペクトルが分離できる程度の
帯域幅Δｆで分割する。この時、例えば破線で対応する
スペクトルを示すように、一方の音源のスペクトルに対
し他方の音源のスペクトルは無視できる。またこの図５
Ａ、５Ｂから理解されるように帯域幅２Δｆで分離して
もよい。つまり、各帯域に１本のスペクトルのみが含ま
れるようにしなくてもよい。なお、離散的フーリエ変換
は例えば２０〜４０ｍｓごとに行う。Returning to the description of FIG. 2 and FIG.
The channel signal and the R channel signal are divided into signals L (f1), L (f2), ..., (Fn) and signals R (f1), R (f2) ,. S
04). This division is performed, for example, by performing discrete Fourier transform on each channel signal to convert it into a frequency domain signal and then dividing it into each frequency band. This band division is performed to the extent that only the signal component of one sound source is mainly present in each band due to the difference in the frequency characteristics of the signals of the sound sources A and B, and in the case of an audio signal, for example, it is divided into 20 Hz bandwidths. The power spectrum of the sound source A is obtained, for example, as shown in FIG. 5A, and the power spectrum of the sound source B is obtained, as shown in FIG. 5B, and each spectrum is divided by a bandwidth Δf that is separable. At this time, for example, the spectrum of one sound source can be ignored with respect to the spectrum of the other sound source, as indicated by the corresponding spectrum with a broken line. See also this figure
As can be seen from A and 5B, they may be separated by a bandwidth 2Δf. In other words, each band does not have to include only one spectrum. The discrete Fourier transform is performed every 20 to 40 ms, for example.

【００１７】次に、帯域別チャネル間時間差／レベル差
検出部５は、例えばＬ（ｆ１）とＲ（ｆ１），…Ｌ（ｆ
ｎ）とＲ（ｆｎ）といった各対応する帯域信号のチャネ
ル間について、帯域別チャネル間時間差またはレベル差
を検出する（Ｓ０５）。ここで、帯域別チャネル間時間
差は、チャネル間時間差検出部３で検出したチャネル間
時間差Δτ₁，Δτ₂を利用することにより一意的に検
出される。この検出に用いる式は以下のとおりである。Next, the inter-channel time difference / level difference detecting section 5 for each band, for example, L (f1) and R (f1), ... L (f
n) and R (fn), the time difference or level difference between channels for each band is detected between the channels of the corresponding band signals (S05). Here, the inter-channel time difference between bands is uniquely detected by using the inter-channel time differences Δτ ₁ and Δτ ₂ detected by the inter-channel time difference detection unit 3. The formula used for this detection is as follows.

【００１８】 Δτ₁−｛（Δφｉ／（２πｆｉ））＋（ｋｉ１／ｆｉ）｝＝ε_i１（３） Δτ₂−｛（Δφｉ／（２πｆｉ））＋（ｋｉ２／ｆｉ）｝＝ε_i２（４）ｉ＝１，２，…，ｎ、Δφｉは信号Ｌ（ｆｉ）と信号Ｒ
（ｆｉ）の位相差である。これら式でε_i１，ε_i２が
最小になるように整数ｋｉ１，ｋｉ２を決める。次に、
その最小値のε_i１とε_i２とを比べて小さい方のチャ
ネル時間差Δτ_j（ｊ＝１，２）を、その帯域ｉのチャ
ネル間時間差Δτ_ijとする。つまり一方の音源信号のそ
の帯域でのチャネル間時間差とする。Δτ ₁ − {(Δφi / (2πfi) ) + (ki1 / fi)} = ε _i 1 (3) Δτ ₂ − {(Δφi / (2πfi) ) + (ki2 / fi)} = ε _i 2 (4) i = 1, 2, ..., N, Δφi are signal L (fi) and signal R
It is the phase difference of (fi). In these expressions, the integers ki1 and ki2 are determined so that ε _i 1 and ε _i 2 are minimized. next,
The smaller channel time difference Δτ _j (j = 1, 2) between the minimum values ε _i 1 and ε _i 2 is defined as the inter-channel time difference Δτ _{ij of the} band i. In other words, it is the time difference between channels in one band of one sound source signal.

【００１９】音源判定信号選別部６は、帯域別チャネル
間時間差／レベル差検出部５で検出された帯域別チャネ
ル間時間差Δτ_1j〜τ_njを用いて各帯域信号Ｌ（ｆ１）
〜Ｌ（ｆｎ）とＲ（ｆ１）〜Ｒ（ｆｎ）との各対応する
ものについて何れを選択するか判定を音声信号判定部６
０１で行う（Ｓ０６）。例えば、チャネル間時間差／レ
ベル差検出部３で算出された時間差Δτ₁，Δτ₂のう
ち、Δτ₁が、Ｌ側のマイクロホンに近い、音源Ａから
の信号のチャネル間時間差であり、Δτ₂が、Ｒ側のマ
イクロホンに近い、音源Ｂからの信号のチャネル間時間
差である場合で説明する。The sound source determination signal selecting section 6 uses the band-by-band time differences Δτ _{1j to} τ _nj detected by the band-by-band time difference / level difference detecting section 5 to obtain each band signal L (f1).
.About.L (fn) and R (f1) to R (fn) corresponding to each other, the audio signal determination unit 6 determines which is selected.
01 (S06). For example, of the time differences Δτ ₁ and Δτ ₂ calculated by the inter-channel time difference / level difference detection unit 3, Δτ ₁ is the inter-channel time difference of the signal from the sound source A near the L-side microphone, and Δτ ₂ is , R near the microphone, the case where there is a time difference between the channels of the signal from the sound source B will be described.

【００２０】この場合、帯域別チャネル間時間差／レベ
ル差検出部５で算出された時間差Δτ_ijがΔτ₁である
帯域ｉは、音声信号判定部６０１によりゲート６０２Ｌ
ｉが開とされてＬ側の入力信号Ｌ（ｆｉ）がそのままＳ
Ａ（ｆｉ）として出力され、Ｒ側の帯域ｉの入力信号Ｒ
（ｆｉ）は音声信号判定部６０１によりゲート６０２Ｒ
が閉とされてＳＢ（ｆｉ）は０として出力される。時間
差Δτ_ijがΔτ₂となる帯域ｉは、逆に、Ｌ側は信号Ｌ
（ｆｉ）はＳＡ（ｆｉ）＝０として出力され、Ｒ側は入
力信号Ｒ（ｆｉ）がそのままＳＢ（ｆｉ）として出力さ
れる。つまり図１に示すように帯域信号Ｌ（ｆ１）〜Ｌ
（ｆｎ）はそれぞれゲート６０２Ｌ１〜６０２Ｌｎを通
じて音源信号合成部７Ａへ供給され、帯域信号Ｒ（ｆ
１）〜Ｒ（ｆｎ）はそれぞれゲート６０２Ｒ１〜６０２
Ｒｎを通じて音源信号合成部７へ供給される。音源判定
信号選別部６内の音声信号判定部６０１ではΔτ_1j〜Δ
τ_njが入力され、Δτ_ijがΔτ₁と判定された帯域ｉに
ついてはゲート制御信号ＣＬｉ＝１とＣＲｉ＝０が生成
され、対応するゲート６０２Ｌｉが開、６０２Ｒｉが閉
にそれぞれ制御され、Δτ_ijがΔτ₂と判定された帯域
ｉについてはゲート制御信号ＣＬｉ＝０と、ＣＲｉ＝１
が生成され、対応するゲート６０２Ｌｉが閉、６０２Ｒ
ｉが開にそれぞれ制御される。以上の説明は機能構成で
あって、実際には例えばデジタルシグナルプロセッサに
より処理される。In this case, the band i in which the time difference Δτ _ij calculated by the inter-channel time difference / level difference detection unit 5 is Δτ ₁ is determined by the audio signal determination unit 601 by the gate 602L.
i is opened and the input signal L (fi) on the L side remains S
Input signal R of band i on the R side, output as A (fi)
(Fi) indicates the gate 602R by the audio signal determination unit 601.
Is closed and SB (fi) is output as 0. On the contrary, the band i in which the time difference Δτ _ij becomes Δτ ₂ is the signal L on the L side.
(Fi) is output as SA (fi) = 0, and the input signal R (fi) is directly output as SB (fi) on the R side. That is, as shown in FIG. 1, band signals L (f1) -L
(Fn) is supplied to the sound source signal synthesis unit 7A through the gates 602L1 to 602Ln, respectively, and the band signal R (f
1) to R (fn) are gates 602R1 to 602, respectively.
It is supplied to the sound source signal synthesis unit 7 through Rn. In the audio signal determination unit 601 in the sound source determination signal selection unit 6, Δτ _1j ~ Δ
For the band i in which τ _nj is input and Δτ _ij is determined to be Δτ ₁ , gate control signals CLi = 1 and CRi = 0 are generated, the corresponding gate 602Li is controlled to open and 602Ri is controlled to close, and Δτ _ij For the band i in which is determined to be Δτ ₂ , the gate control signal CLi = 0 and CRi = 1
Is generated, the corresponding gate 602Li is closed, 602R
i is controlled to be open. The above description is a functional configuration, and is actually processed by, for example, a digital signal processor.

【００２１】音源信号合成部７Ａで信号ＳＡ（ｆｉ）〜
ＳＡ（ｆｎ）が合成され、前記帯域分割の例ではそれぞ
れ逆フーリエ変換され、信号ＳＡとして出力端子ｔ_Aに
出力され、また音源信号合成部７Ｂで信号ＳＢ（ｆｉ）
〜ＳＢ（ｆｎ）が同様に合成されて信号ＳＢとして出力
端子ｔ_Bに出力される。以上の説明で明らかなように、
この発明装置においては、各チャネル信号の細かく帯域
分割した、各帯域成分がそれぞれどの音源からのもので
あるかを判定し、判定された成分は全て出力する、すな
わち、音源Ａ，Ｂの信号の周波数成分が互いに重なって
いなければ、特定の周波数帯域を欠落させることなく処
理を行うため、調波構造のみ抜き出す従来の方法に比べ
て音質を高く保ったまま音源Ａ，Ｂの各信号を分離する
ことが可能である。In the sound source signal synthesizing unit 7A, the signals SA (fi) to
SA (fn) is combined, and in the example of the band division, inverse Fourier transform is performed, and the result is output to the output terminal t _A as the signal SA, and the source signal combiner 7B outputs the signal SB (fi).
To SB (fn) is output to the output terminal t _B as similarly synthesized by the signal SB. As is clear from the above explanation,
In the device of the present invention, it is determined which sound source each band component is, which is obtained by finely band-dividing each channel signal, and outputs all the determined components, that is, the signals of the sound sources A and B. If the frequency components do not overlap each other, the processing is performed without losing a specific frequency band, so that the signals of the sound sources A and B are separated while maintaining the sound quality higher than the conventional method of extracting only the harmonic structure. It is possible.

【００２２】以上の説明は、チャネル間時間差／レベル
差検出部３及び帯域別チャネル間時間差／レベル差検出
部５で検出した、チャネル間時間差と、帯域別チャネル
間時間差のみを利用して、音源判定信号部６０１で判定
条件を決定した。次にこの判定条件の決定をチャネル間
のレベル差を用いて処理する実施例を説明する。この実
施例は図６に示すようにマイクロホン１，２からＬチャ
ネル信号とＲチャネル信号を取込み（Ｓ０２）、これら
Ｌチャネル信号とＲチャネル信号のチャネル間レベル差
ΔＬをチャネル間時間差／レベル差検出部３（図２）で
検出する（Ｓ０３）。図３中のステップＳ０４と同様
に、Ｌチャネル信号、Ｒチャネル信号をそれぞれｎ個の
帯域別チャネル信号Ｌ（ｆ１）〜Ｌ（ｆｎ），Ｒ（ｆ
１）〜Ｒ（ｆｎ）に分割し（Ｓ０４）、帯域別チャネル
信号Ｌ（ｆ１）〜Ｌ（ｆｎ）とＲ（ｆ１）〜Ｒ（ｆｎ）
との対応帯域、つまりＬ（ｆ１）とＲ（ｆ１），Ｌ（ｆ
２）とＲ（ｆ２），…，Ｌ（ｆｎ）とＲ（ｆｎ）につい
て帯域別チャネル間レベル差ΔＬ１，ΔＬ２，…，ΔＬ
ｎを検出する（Ｓ０５）。The above description uses only the inter-channel time difference and the inter-band time difference between channels detected by the inter-channel time difference / level difference detection section 3 and the band-specific channel time difference / level difference detection section 5. The determination signal unit 601 determines the determination condition. Next, an embodiment in which the determination of the determination condition is processed using the level difference between channels will be described. In this embodiment, as shown in FIG. 6, the L channel signal and the R channel signal are taken in from the microphones 1 and 2 (S02), and the channel level difference ΔL between the L channel signal and the R channel signal is detected. The detection is performed by the unit 3 (FIG. 2) (S03). Similar to step S04 in FIG. 3, n channel-specific channel signals L (f1) to L (fn), R (f
1) to R (fn) (S04), and band-specific channel signals L (f1) to L (fn) and R (f1) to R (fn).
Corresponding band, that is, L (f1) and R (f1), L (f
2) and R (f2), ..., L (fn) and R (fn), level differences between channels ΔL1, ΔL2 ,.
n is detected (S05).

【００２３】人間の音声は、２０ｍｓ〜４０ｍｓ程度の
間は定常状態とみなすことが出来る。そのため、音声信
号判定部６０１（図２）においては、２０ｍｓ〜４０ｍ
ｓ毎に、チャネル間レベル差ΔＬの対数を取った値の符
号と、帯域別チャネル間レベル差ΔＬｉの対数を取った
値の符号とが、全帯域のうち何割以上の帯域で、同じ符
号（＋又は−）になるのかを算出し、所定値、例えば８
割以上の帯域で両者が同じ符号を持てば（Ｓ０６，Ｓ０
７）、そこから２０ｍｓ〜４０ｍｓの間はチャネル間レ
ベル差ΔＬのみで判定し（Ｓ０８）、同じ符号を持つの
が８割以下の帯域であれば、そこから２０ｍｓ〜４０ｍ
ｓの間は帯域毎に、帯域別チャネル間レベル差ΔＬｉを
用いて判定する（Ｓ０９）。判定の仕方は、全帯域をチ
ャネル間レベル差ΔＬで判定する場合は、ΔＬが正であ
れば、Ｌチャネル信号Ｌ（ｔ）がそのまま信号ＳＡとし
て出力され、Ｒチャネル信号Ｒ（ｔ）は信号ＳＢ＝０と
して出力される。ΔＬが０以下であれば逆に、Ｌチャネ
ル信号Ｌ（ｔ）は信号ＳＡ＝０として出力され、Ｒチャ
ネル信号Ｒ（ｔ）がそのまま信号ＳＢとして出力され
る。ただし、これは、チャネル間レベル差としてＬ側か
らＲ側を引いた値を用いた場合の説明である。また、帯
域別チャネル間レベル差ΔＬｉを用いて帯域毎に判定す
る場合は、各帯域ｆｉごとに帯域別チャネル間レベル差
ΔＬｉが正であれば、Ｌ側分割信号Ｌ（ｆｉ）がそのま
ま信号ＳＡ（ｆｉ）として出力され、Ｒ側分割信号Ｒ
（ｆｉ）は信号ＳＢ（ｆｉ）＝０として出力される。レ
ベル差ΔＬｉが０以下であれば逆に、Ｌ側は分割信号Ｌ
（ｆｉ）は信号ＳＡ（ｆｉ）＝０として出力され、Ｒ側
は分割信号Ｒ（ｆｉ）が信号ＳＢ（ｆｉ）として出力さ
れる。以上のようにして音声信号判定部６０１からゲー
ト制御信号ＣＬ１〜ＣＬｎ，ＣＲ１〜ＣＲｎが出力さ
れ、ゲート６０２Ｌ１〜６０２Ｌｎ，６０２Ｒ１〜６０
２Ｒｎがそれぞれ制御される。これも、前者と同様、帯
域別チャネル間レベル差として、Ｌ側からＲ側を引いた
値を用いた場合の説明である。信号ＳＡ（ｆ１）〜ＳＡ
（ｆｎ）、信号ＳＢ（ｆ１）〜ＳＢ（ｆｎ）は先の実施
例と同様にそれぞれ合成された信号ＳＡ，ＳＢとして出
力端子ｔ_A，ｔ_Bにそれぞれ出力される（Ｓ１０）。Human voice can be regarded as a steady state for about 20 ms to 40 ms. Therefore, in the audio signal determination unit 601 (FIG. 2), 20 ms to 40 m
For each s, the sign of the value obtained by taking the logarithm of the inter-channel level difference ΔL and the sign of the value taking the logarithm of the band-by-band inter-channel level difference ΔLi are the same code in a percentage of all bands or more. It is calculated whether it becomes (+ or-), and a predetermined value, for example, 8
If both have the same code in the bandwidth of more than 50% (S06, S0
7) From there, for 20 ms to 40 ms, it is judged only by the inter-channel level difference ΔL (S08), and if the band having the same code is 80% or less, it is 20 ms to 40 m from there.
During s, determination is made for each band using the level difference ΔLi between bands for each band (S09). When the entire band is determined by the channel level difference ΔL, if ΔL is positive, the L channel signal L (t) is output as it is as the signal SA, and the R channel signal R (t) is the signal. It is output as SB = 0. If ΔL is 0 or less, the L channel signal L (t) is output as the signal SA = 0, and the R channel signal R (t) is output as the signal SB as it is. However, this is the explanation when the value obtained by subtracting the R side from the L side is used as the level difference between channels. When the band-by-band inter-channel level difference ΔLi is used to determine each band, if the band-by-band inter-channel level difference ΔLi is positive for each band fi, the L-side division signal L (fi) is the signal SA as it is. Is output as (fi) and the R-side division signal R
(Fi) is output as the signal SB (fi) = 0. On the contrary, if the level difference ΔLi is 0 or less, the divided signal L is on the L side.
(Fi) is output as the signal SA (fi) = 0, and the divided signal R (fi) is output as the signal SB (fi) on the R side. As described above, the gate control signals CL1 to CLn and CR1 to CRn are output from the audio signal determination unit 601, and the gates 602L1 to 602Ln and 602R1 to 60 are output.
2Rn are controlled respectively. Similarly to the former case, this is also an explanation in the case where a value obtained by subtracting the R side from the L side is used as the level difference between channels for each band. Signals SA (f1) -SA
(Fn) and signals SB (f1) to SB (fn) are output to output terminals t _A and t _B as synthesized signals SA and SB, respectively, as in the previous embodiment (S10).

【００２４】前記実施例では、音声信号判定部６０１で
用いる判定条件として、到達時間差とレベル差のうちど
ちらかの片方のみを利用する。しかし、レベル差のみを
利用した場合、低域の周波数帯域ではＬ（ｆｉ）とＲ
（ｆｉ）とのレベルが拮抗する場合があり、その場合は
レベル差を正確に求めることが困難になる。また、時間
差のみを利用した場合は、高い周波数帯域においては、
位相の回転が起こるため時間差を正しく算出することが
困難な場合がある。これらの点から、低域の周波数帯域
では時間差を、高域ではレベル差を判定に用いた方が、
全帯域に渡り単一のパラメータを用いるよりも有利であ
る場合がある。In the above-described embodiment, only one of the arrival time difference and the level difference is used as the judgment condition used in the audio signal judgment unit 601. However, if only the level difference is used, L (fi) and R are used in the low frequency band.
There is a case where the level is in conflict with (fi), and in that case, it becomes difficult to accurately obtain the level difference. Also, when only the time difference is used, in the high frequency band,
Since phase rotation occurs, it may be difficult to calculate the time difference correctly. From these points, it is better to use the time difference in the low frequency band and the level difference in the high frequency band for the determination.
It may be advantageous over using a single parameter over the entire band.

【００２５】そこで、音声信号判定部６０１で帯域別チ
ャネル間時間差と帯域別チャネル間レベル差を共に用い
る実施例を図７以下の図面を参照して説明する。この実
施例の機能構成のブロックとしては図２と同一である
が、チャネル間時間差／レベル差検出部分３、帯域別チ
ャネル間時間差／レベル差検出部５と音声信号判定部６
０１での処理が以下のように異なる。チャネル間時間差
／レベル差検出部３は、検出された時間差Δτ₁，Δτ
₂の各絶対値の平均、又はΔτ₁，Δτ₂が比較的近い
値であれば、その一方のみなど、一つの時間差Δτを出
力する。なおチャネル間時間差Δτ₁，Δτ₂，Δτを
チャネル信号Ｌ（ｔ），Ｒ（ｔ）を周波数軸上で帯域分
割する前に算出したが、帯域分割した後に算出すること
も可能である。Therefore, an embodiment in which both the band-based channel time difference and the band-based channel level difference are used in the audio signal determination section 601 will be described with reference to the drawings starting from FIG. The block of the functional configuration of this embodiment is the same as that of FIG. 2, but the inter-channel time difference / level difference detection part 3, the inter-channel time difference / level difference detection part 5 and the audio signal determination part 6 are included.
The processing in 01 differs as follows. The inter-channel time difference / level difference detection unit 3 detects the detected time differences Δτ ₁ , Δτ.
If the average of the absolute values of ₂ or Δτ ₁ and Δτ ₂ are relatively close values, only one of them is output as one time difference Δτ. The inter-channel time differences Δτ ₁ , Δτ ₂ , and Δτ are calculated before the channel signals L (t) and R (t) are band-divided on the frequency axis, but may be calculated after the band division.

【００２６】図７に示すように、Ｌチャネル信号Ｌ
（ｔ）、Ｒチャネル信号Ｒ（ｔ）をフレーム（例えば２
０〜４０ｍｓ）毎に読み込み（Ｓ０２）、帯域分割部４
でＬチャネル信号、Ｒチャネル信号をそれぞれ複数の周
波数帯域に分割する。この例ではＬチャネル信号Ｌ
（ｔ）、Ｒチャネル信号Ｒ（ｔ）にそれぞれハニング窓
をかけ（Ｓ０３）、それぞれフーリエ変換を施して分割
された信号Ｌ（ｆ１）〜Ｌ（ｆｎ）、Ｒ（ｆ１）〜Ｒ
（ｆｎ）を得る（Ｓ０４）。次に、帯域別チャネル間時
間差／レベル差検出部５では分割された信号の周波数ｆ
ｉが１／（２Δτ）（Δτはチャネル時間差）以下の帯
域（以下、低域と呼ぶ）であるかを調べ（Ｓ０５）、以
下であれば帯域別チャネル間位相差Δφｉを出力し（Ｓ
０８）、分割された信号の周波数ｆが１／（２Δτ）よ
り大きく１／Δτ未満の帯域（以下、中域と呼ぶ）であ
るかがチェックされ（Ｓ０６）、この中域であれば帯域
別チャネル間位相差Δφｉ及びレベル差ΔＬｉを出力し
（Ｓ０９）、分割された信号の周波数ｆが１／Δτ以上
の帯域（以下、高域と呼ぶ）かがチェックされ（Ｓ０
７）、高域であれば帯域別チャネル間レベル差ΔＬｉを
出力する（Ｓ１０）。As shown in FIG. 7 , the L channel signal L
(T), the R channel signal R (t) is converted into a frame (for example, 2
Every 0 to 40 ms) (S02), band division unit 4
The L channel signal and the R channel signal are each divided into a plurality of frequency bands. In this example, the L channel signal L
(T), the R channel signal R (t) is subjected to a Hanning window (S03), and Fourier transform is applied to the divided signals L (f1) to L (fn) and R (f1) to R.
(Fn) is obtained (S04). Next, the frequency difference f of the divided signals is detected by the time difference / level difference detection section 5 for each band.
It is checked whether i is a band (hereinafter referred to as a low band) equal to or smaller than 1 / (2Δτ) (Δτ is a channel time difference) (S05). If i is equal to or smaller than the band, the inter-channel phase difference Δφi for each band is output (S5).
08), it is checked whether or not the frequency f of the divided signal is in a band (hereinafter, referred to as a middle band) larger than 1 / (2Δτ) and less than 1 / Δτ (S06). The inter-channel phase difference Δφi and the level difference ΔLi are output (S09), and it is checked whether the frequency f of the divided signals is 1 / Δτ or more (hereinafter referred to as high band) (S0).
7) If it is in the high frequency band, the level difference ΔLi between channels for each band is output (S10).

【００２７】音声信号判定部６０１は、帯域別チャネル
間時間差／レベル差検出部５で検出された帯域別チャネ
ル間位相差、レベル差を用いてＬ（ｆ１）〜Ｌ（ｆ
ｎ）、Ｒ（ｆ１）〜Ｒ（ｆｎ）それぞれについて何れを
出力するかの判定を行う。なお、位相差Δφｉ、レベル
差ΔＬについては、この例では共にＬ側からＲ側の値を
引いて算出した値を用いる。The voice signal determination unit 601 uses the band-by-band phase difference between channels and the level difference detected by the band-by-band channel time difference / level difference detection unit 5 to determine L (f1) to L (f).
n) and R (f1) to R (fn) are determined. For the phase difference Δφi and the level difference ΔL, in this example, the values calculated by subtracting the values on the R side from the L side are used.

【００２８】低域と判定された信号Ｌ（ｆｉ），Ｒ（ｆ
ｉ）については図８に示すようにまず位相差Δφｉがπ
以上かを調べ（Ｓ１５）、π以上であればΔφｉから２
πを減算した値をΔφｉとし（Ｓ１７）、ステップＳ１
５でΔφｉがπ以上でなければ、−π以下かを調べ（Ｓ
１６）、以下であればΔφｉに２πを加算した値をΔφ
ｉとし（Ｓ１８）、ステップＳ１６で−π以下でなけれ
ばΔφｉをそのまま用いる（Ｓ１９）。ステップＳ１
７，Ｓ１８，Ｓ１９で求めた帯域別チャネル間位相差Δ
φｉを時間差Δσｉに次式で変換する（Ｓ２０）。The signals L (fi), R (f
Regarding i), as shown in FIG. 8, first, the phase difference Δφi is π.
It is checked whether or not (S15). If π or more, Δφi is 2
The value obtained by subtracting π is Δφi (S17), and step S1
If Δφi is not π or more in 5, check whether it is −π or less (S
16), if the following is satisfied, the value obtained by adding 2π to Δφi is Δφ
i (S18), and Δφi is used as it is (S19) unless it is −π or less in step S16. Step S1
7. Phase difference Δ between channels obtained by S18 and S19
φi is converted into a time difference Δσi by the following equation (S20).

【００２９】 Δσｉ＝１０００・Δφｉ／２πｆｉ（５）分割された信号Ｌ（ｆｉ），Ｒ（ｆｉ）が中域と判定さ
れた場合は図９に示すように帯域別チャネル間レベル差
ΔＬ（ｆｉ）を利用して、位相差Δφｉを一意に決定す
る。即ちΔＬ（ｆｉ）が正かを調べ（Ｓ２３）、正であ
れば、その帯域別チャネル間位相差Δφｉが正であるか
を調べ（Ｓ２４）、正であればそのΔφｉをそのまま出
力し（Ｓ２６）、ステップＳ２４で正でなければΔφｉ
に２πを加算した値をΔφｉとして出力する（Ｓ２
７）。ステップＳ２３でΔＬ（ｆｉ）が正でなければ、
その帯域別チャネル間位相差Δφｉが負であるかを調べ
（Ｓ２５）、負であれば、そのΔφｉをそのままΔφｉ
として出力し（Ｓ２８）、ステップＳ２５で負でなけれ
ばΔφｉから２πを減算した値をΔφｉとして出力する
（Ｓ２９）。これらステップＳ２６〜Ｓ２９の何れかの
Δφｉが次式によりその帯域別チャネル間時間差Δσｉ
として演算される（Ｓ３０）。Δσi = 1000Δφi / 2πfi (5) When the divided signals L (fi) and R (fi) are determined to be in the middle band, as shown in FIG. ) Is used to uniquely determine the phase difference Δφi. That is, it is checked whether ΔL (fi) is positive (S23), and if positive, it is checked whether the band-by-band phase difference Δφi between channels is positive (S24). If positive, the Δφi is output as it is (S26). ), If not positive in step S24, Δφi
The value obtained by adding 2π to is output as Δφi (S2
7). If ΔL (fi) is not positive in step S23,
It is checked whether the phase difference Δφi between channels for each band is negative (S25), and if negative, the Δφi is unchanged as Δφi.
Is output (S28), and if not negative in step S25, the value obtained by subtracting 2π from Δφi is output as Δφi (S29). The Δφi of any of these steps S26 to S29 is calculated by the following equation, and the time difference ΔΣi between channels for each band is calculated.
Is calculated as (S30).

【００３０】 Δσｉ＝１０００・Δφｉ／２πｆｉ（６）以上のようにして低域、中域における帯域別チャネル間
時間差Δσｉと、高域における帯域別チャネル間レベル
差ΔＬ（ｆｉ）が得られ、これらに応じて音源信号の判
別が次のようになされる。図１０に示すように低域と中
域においては位相差Δφｉを、高域においてはレベル差
ΔＬｉを利用して両チャネルの各周波数成分を該当する
どちらかの音源の信号として判別する。具体的には、低
域と中域においては図８、９でそれぞれ求められた帯域
別チャネル間時間差Δσｉが正であるかを調べ（Ｓ３
４）、正であれば、その帯域ｉのＬ側チャネル信号Ｌ
（ｆｉ）を信号ＳＡ（ｆｉ）として出力し、Ｒ側帯域チ
ャネル信号Ｒ（ｆｉ）を０の信号ＳＢ（ｆｉ）として出
力する（Ｓ３６）。ステップＳ３４で帯域別チャネル時
間差Δσｉが正でない場合は逆にＳＡ（ｆｉ）として０
を出力し、ＳＢ（ｆｉ）としてＲ側チャネル信号Ｒ（ｆ
ｉ）を出力する（Ｓ３７）。Δσi = 1000 · Δφi / 2πfi (6) As described above, the band-based inter-channel time difference Δσi in the low band and the mid band and the band-based inter-channel level difference ΔL (fi) in the high band are obtained. The sound source signal is discriminated according to the following. As shown in FIG. 10, the phase difference Δφi is used in the low band and the middle band, and the level difference ΔLi is used in the high band to discriminate each frequency component of both channels as a signal of either sound source. Specifically, in the low band and the middle band, it is checked whether the band-by-band time difference Δσi obtained in FIGS. 8 and 9 is positive (S3).
4) If positive, L-side channel signal L of band i
(Fi) is output as the signal SA (fi), and the R-side band channel signal R (fi) is output as the signal SB (fi) of 0 (S36). If the band-based channel time difference Δσi is not positive in step S34, conversely, SA (fi) is set to 0.
And outputs R-side channel signal R (f
i) is output (S37).

【００３１】また、高域においては、図７中のステップ
Ｓ１０で検出した帯域別チャネル間レベル差ΔＬ（ｆ
ｉ）が正であるかを調べ（Ｓ３５）、正であれば信号Ｓ
Ａ（ｆｉ）としてＬ側チャネル信号Ｌ（ｆｉ）を出力
し、ＳＢ（ｆｉ）として０を出力する（Ｓ３８）。ステ
ップＳ３５でレベル差ΔＬｉが正でなければＳＡ（ｆ
ｉ）として０を出力し、ＳＢ（ｆｉ）としてＲ側帯域チ
ャネル信号Ｒ（ｆｉ）を出力する（Ｓ３９）。Further, in the high frequency range, the level difference ΔL (f between channels for each band detected in step S10 in FIG.
It is checked whether i) is positive (S35), and if positive, the signal S
The L-side channel signal L (fi) is output as A (fi), and 0 is output as SB (fi) (S38). If the level difference ΔLi is not positive in step S35, SA (f
0 is output as i) and the R-side band channel signal R (fi) is output as SB (fi) (S39).

【００３２】以上のようにして各帯域についてＬ側又は
Ｒ側が出力され、音源信号合成部７Ａ，７Ｂでそれぞれ
判別した各周波数成分を全帯域に渡り加算し（Ｓ４
０）、かつ、加算した各信号を逆フーリエ変換し（Ｓ４
１）、その変換した信号ＳＡ，ＳＢを出力する（Ｓ４
２）。以上説明したように、この実施例においては、周
波数帯域毎に音源分離に有利なパラメータを用いること
により、全帯域に渡り単一のパラメータを用いる場合に
比べてより分離性能の高い音源分離を実現することが可
能である。As described above, the L side or the R side is output for each band, and the respective frequency components discriminated by the sound source signal synthesis units 7A and 7B are added over the entire band (S4).
0) and inverse Fourier transform of each added signal (S4
1) and outputs the converted signals SA and SB (S4)
2). As described above, in this embodiment, by using the parameter advantageous for the sound source separation for each frequency band, the sound source separation having the higher separation performance is realized as compared with the case where the single parameter is used over the entire band. It is possible to

【００３３】この発明は音源の数が３個以上でも適用で
きる。例として、音源数が３、マイクロホン数が２であ
る場合でマイクロホンへの到達時間差を利用して音源分
離する場合を説明する。この場合、チャネル間時間差／
レベル差検出部３で各音源についてＬチャネル信号、Ｒ
チャネル信号のチャネル間時間差を算出する際に、図４
に示したように相互相関のパワーで正規化したヒストグ
ラムの、累積度数（ピーク値）第一位から第三位までを
とる各時点を求めることによって各音源信号についての
チャネル間時間差Δτ₁，Δτ₂，Δτ ₃を算出する。
そして、帯域別チャネル間時間差／レベル差検出部５に
おいても、各帯域の帯域別チャネル間時間差をΔτ₁か
らΔτ₃のどれかに決定する。この決定の仕方は、前記
実施例で述べた計算式（３），（４）と同様である。音
声信号判定部６０１では、例として、Δτ₁＞０、Δτ
₂＞０、Δτ₃＜０である場合で説明する。ここで、Δ
τ₁，Δτ₂，Δτ₃はそれぞれ、音源Ａ，Ｂ，Ｃ各信
号のチャネル間時間差と仮定し、さらに、これらの値は
Ｌ側からＲ側の値を引いて算出した値と仮定する。この
場合、音源ＡはＬ側のマイクロホン１に近く、音源Ｂは
Ｒ側のマイクロホン２の近くにある。よって、Ｌチャネ
ルの信号から、帯域別チャネル間時間差がΔτ₁となる
帯域の信号を加算して音源Ａの信号を、またΔτ₂とな
る帯域を加算して、音源Ｂの信号をそれぞれ分離するこ
とが可能である。また、Ｒチャネル信号から、帯域別チ
ャネル間時間差がΔτ₃となる帯域の信号を加算して出
力することにより、音源Ｃの信号を分離する。The present invention can be applied even if the number of sound sources is three or more.
Wear. As an example, the number of sound sources is 3 and the number of microphones is 2.
If the sound source is
The case of separating will be described. In this case, the time difference between channels /
In the level difference detection unit 3, L channel signals, R
When calculating the inter-channel time difference of the channel signals, FIG.
Histogram normalized by cross-correlation power as shown in
Lamb's cumulative frequency (peak value) from 1st to 3rd
For each source signal by finding each time point taken
Channel time difference Δτ₁, Δτ₂, Δτ ₃To calculate.
Then, the band time difference between channels / level difference detection unit 5
In addition, the time difference between channels in each band is Δτ₁Or
Et Δτ₃To decide which one. This decision is based on the above
This is the same as the calculation formulas (3) and (4) described in the embodiment. sound
In the voice signal determination unit 601, for example, Δτ₁> 0, Δτ
₂> 0, Δτ₃The case where <0 is described. Where Δ
τ₁, Δτ₂, Δτ₃Are sound sources A, B, and C, respectively.
Assuming the inter-channel time difference of the signal, these values are
It is assumed that the value is calculated by subtracting the value on the R side from the value on the L side. this
In this case, the sound source A is close to the microphone 1 on the L side, and the sound source B is
It is near the microphone 2 on the R side. Therefore, L channel
, The time difference between channels for each band is Δτ₁Becomes
The signals of the sound source A are added by adding the band signals, and Δτ₂Tona
, The sound source B signal is separated.
And are possible. In addition, from the R channel signal,
The time difference between channels is Δτ₃Signals in the band
By applying the force, the signal of the sound source C is separated.

【００３４】上述の音源分離において、発話者２１５
と、スピーカ２１１とが固定されている場合は、発話者
２１５（又はスピーカ２１１）からの音響信号がマイク
ロホン１と２と到達する時間差Δτ₁（又はΔτ₂）は
一定であり、予め知ることができ、同様チャネル間レベ
ル差ΔＬは予め知ることができる。従って、図３中のス
テップＳ０３のチャネル間時間差Δτ₁、Δτ₂の検出
や図６中のステップＳ０３のチャネル間レベル差ΔＬの
検出は省略することができ、図２中のチャネル間時間差
／レベル差検出部３を省略できる。また帶域別チャネル
間レベル差ΔＬ（ｆｉ）を利用する場合は、図６におい
て、ステップＳ０３，Ｓ０６，Ｓ０７，Ｓ０８を省略し
て、常に各分割帯域ごとに帯域別チャネル間レベル差を
用いて音源分離をしてもよい。つまりチャネル間レベル
差は検出しなくてもよい。ただ図６に示すような処理を
行えばｐ／ｎ≧０．８が成立する場合は、処理が簡単に
なる。In the above sound source separation, the speaker 215
And the speaker 211 is fixed, the time difference Δτ ₁ (or Δτ ₂ ) at which the acoustic signal from the speaker 215 (or the speaker 211) reaches the microphones 1 and 2 is constant, and it may be known in advance. Similarly, the level difference ΔL between channels can be known in advance. Therefore, the detection of the inter-channel time differences Δτ ₁ and Δτ ₂ in step S03 in FIG. 3 and the detection of the inter-channel level difference ΔL in step S03 in FIG. 6 can be omitted, and the inter-channel time difference / level in FIG. 2 can be omitted. The difference detector 3 can be omitted. Further, when using the inter-channel level difference ΔL (fi), steps S03, S06, S07, and S08 are omitted in FIG. 6, and the band-based inter-channel level difference is always used for each divided band. Sound sources may be separated. That is, it is not necessary to detect the level difference between channels. However, if p / n ≧ 0.8 is satisfied by performing the processing shown in FIG. 6, the processing becomes simple.

【００３５】上述では音源信号を分離し、分離された各
音源信号ＳＡ，ＳＢを各別に出力した。しかし、例えば
一方の音源Ａは発話者による音声であり、他方の音源Ｂ
は騒音のような場合、騒音と混合された音源Ａの信号音
を分離抽出し、騒音を抑圧するためにもこの発明を適用
することができる。一方の音源Ａ、例えば発話者が他方
の音源Ｂ、つまりスピーカより周波数帯域が広い場合で
その各周波数帯域が予め知られている場合は、図１１に
示すように図２において帯域分離部１１において、両音
源信号の重なっていない周波数帯域を分離する。例えば
音源Ａの信号Ａ（ｔ）の周波数帯域はｆ１〜ｆｎである
が音源Ｂの信号Ｂ（ｔ）の周波数帯域はｆ１〜ｆｎ（ｆ
ｎ＞ｆｍ）の場合、重なっていない帯域ｆｍ＋１〜ｆｎ
の信号をマイクロホン１，２の出力から分離し、この帯
域ｆｍ＋１〜ｆｎの信号については、音声信号判定部６
０１の判定処理、場合によっては帯域別チャネル間時間
差／レベル差検出部５の処理を行わず、音声信号判定部
６０１は、音源Ｂの信号として選出するチャネル信号Ｓ
Ｂ（ｔ）として選出するＲの分割された帯域チャネル信
号Ｒ（ｆｍ＋１）〜Ｒ（ｆｎ）をそれぞれＳＢ（ｆｍ＋
１）〜ＳＢ（ｆｎ）として出力し、ＳＡ（ｆｍ＋１）〜
ＳＡ（ｆｎ）は０を出力させるように音声信号選択部６
０２を制御する。即ちゲート６０２Ｌｍ＋１〜６０２Ｌ
ｎは常閉とし、ゲート６０２Ｒｍ＋１〜６０２Ｒｎは常
開とする。In the above description, the sound source signals are separated, and the separated sound source signals SA and SB are output separately. However, for example, one sound source A is the voice of the speaker and the other sound source B
In the case of noise, the present invention can be applied to suppress the noise by separating and extracting the signal sound of the sound source A mixed with the noise. When one sound source A, for example, a speaker has a wider frequency band than the other sound source B, that is, a speaker, and each frequency band is known in advance, as shown in FIG. , Separate the frequency bands where both sound source signals do not overlap. For example, the frequency band of the signal A (t) of the sound source A is f1 to fn, but the frequency band of the signal B (t) of the sound source B is f1 to fn (f
n> fm), the non-overlapping bands fm + 1 to fm
Is separated from the outputs of the microphones 1 and 2, and the signals in the bands fm + 1 to fn are separated by the audio signal determination unit 6
01, and in some cases, the band-by-band inter-channel time difference / level difference detection unit 5 is not performed, and the audio signal determination unit 601 selects the channel signal S selected as the signal of the sound source B.
Each of the R divided band channel signals R (fm + 1) to R (fn) selected as B (t) is SB (fm +).
1) to SB (fn), and SA (fm + 1) to
SA (fn) outputs the audio signal selection unit 6 so that 0 is output.
Control 02. That is, the gate 602Lm + 1 to 602L
n is normally closed, and gates 602Rm + 1 to 602Rn are normally open.

【００３６】上述では各帯域別チャネル間時間差Δσｉ
が正か負かにより、また各帯域別チャネル間レベル差Δ
Ｌｉが正か負かにより、つまり、いずれも０をしきい値
として、その帯域信号が何れのマイクロホンに近いかを
判別した。これはマイクロホン１として結ぶ線の２等分
線に対して音源Ａと音源Ｂと左右対称に位置している場
合である。この関係にない場合は判別しきい値を以下の
ように決めればよい。In the above description, the time difference Δσi between channels for each band
Is positive or negative, and the level difference between channels for each band Δ
Depending on whether Li is positive or negative, that is, with 0 as the threshold value, it was determined which microphone the band signal was closer to. This is the case where the sound source A and the sound source B are located symmetrically with respect to the bisector of the line connecting the microphones 1. If this relationship is not satisfied, the discrimination threshold may be determined as follows.

【００３７】音源Ａの信号がマイクロホン１、マイクロ
ホン２に到達する帯域別チャネル間レベル差をΔＬ_A、
到達する帯域別チャネル間時間差をΔτ_A、音源Ｂの信
号がマイクロホン１、マイクロホン２に到達する帯域別
チャネル間レベル差をΔＬ_B、到達する帯域別チャネル
間時間差をΔτ_Bとそれぞれする。このとき、帯域別チ
ャネル間レベル差のしきい値ΔＬthは ΔＬth＝（ΔＬ_A＋ΔＬｉ）／２とし、帯域別チャネル間時間差のしきい値Δτthは Δτth＝（Δτ_A＋Δτ_B）／２とすればよい。先に述べた実施例ではΔＬ_B＝−Δ
Ｌ_A、Δτ_B＝−Δτ_Aの場合でΔＬth＝０、Δτth＝
０となる。音源Ａ，Ｂを分離できるように、二つの音源
をマイクロホン１，２に対し、互いに異なる側となるよ
うに、マイクロホン１，２を位置させ、マイクロホン
１，２に対する距離、方向は必ずしも正しくはわかって
いない場合があり、しきい値ΔＬth，Δτthを可変とし
て、分離がよく行われるようにΔＬth，Δτthを調整可
能としてもよい。ΔL _{A is} the level difference between the channels for each band in which the signal of the sound source A reaches the microphone 1 and the microphone 2.
The arriving band-based channel time difference is Δτ _A , the band-based channel level difference that the signal of the sound source B reaches the microphone 1 and the microphone 2 is ΔL _B , and the arriving band-based channel time difference is Δτ _B. At this time, if the threshold value ΔLth of the level difference between channels by band is ΔLth = (ΔL _A + ΔLi) / 2, and the threshold Δτth of the time difference between channels by band is Δτth = (Δτ _A + Δτ _B ) / 2. Good. In the embodiment described above, ΔL _B = −Δ
When L _A , Δτ _B = −Δτ _A , ΔLth = 0, Δτth =
It becomes 0. In order to separate the sound sources A and B, the microphones 1 and 2 are positioned so that the two sound sources are on different sides with respect to the microphones 1 and 2, and the distance and direction to the microphones 1 and 2 are not always correct. In some cases, the thresholds ΔLth and Δτth may be variable, and ΔLth and Δτth may be adjustable so that separation is often performed.

【００３８】図１２はこのハウリング抑圧方法を、更に
改善したものである。スピーカ２１１に接続された相手
側からの伝送線２１２に分岐部２３１が挿入され、これ
により分岐された相手発話者からの音声信号は必要に応
じて遅延部２３２で遅延された後、帯域分割部２３３で
複数の周波数帯域に分割される。この分割は、帯域分割
部４で行われる分割数と等しく、かつ同様の手法により
行えばよい。この相手側より音声信号の帯域分割された
各帯域の成分が、送信可能帯域判定部２３４で分析さ
れ、その成分の周波数帯域が送信可能な周波数帯域であ
るか否かの判定がなされる。つまり、相手側からの音声
信号の周波数成分が無い帯域又は十分レベルが小さい帯
域は送信可能帯域と判定される。また分割部４は相手側
からの受信信号は、分割された帯域にその受信信号の成
分が無視できる帯域が得られる程度に狭い帯域に分割す
る。FIG. 12 shows a further improvement of this howling suppression method. The branching unit 231 is inserted into the transmission line 212 from the other party connected to the speaker 211, the voice signal from the other party's speaker branched by this is delayed by the delaying unit 232 as necessary, and then the band dividing unit At 233, the frequency band is divided into a plurality of frequency bands. This division is equal to the number of divisions performed by the band division unit 4 and may be performed by a similar method. The component of each band obtained by dividing the band of the voice signal from the other party is analyzed by the transmittable band determination unit 234, and it is determined whether or not the frequency band of the component is a transmittable frequency band. That is, a band having no frequency component of the voice signal from the other side or a band having a sufficiently low level is determined as a transmittable band. Further, the division unit 4 divides the received signal from the other side into narrow bands so that a band in which the components of the received signal can be ignored is obtained in the divided band.

【００３９】音声信号選択部６０２Ｌと音源信号合成部
７Ａとの間に送信可能成分選択部２３５が挿入される。
音声信号選択部６０２Ｌにより、マイクロホン１の出力
信号Ｓ１から発話者２１５の音声信号と判定選択され、
更にこれら判定選択された帯域成分は、送信可能成分選
択部２３５で、送信可能帯域判定部２３４により、送信
可能な帯域と判定されたもののみが選択されて音源信号
合成部７Ａへ送られる。従って、スピーカ２１１から放
声され、ハウリングの原因となる可能性のある周波数成
分は、伝送線２１６に送出されず、ハウリングの発生を
一層確実に抑圧することができる。送信可能成分選択部
２３５としては、音声信号判定部６０１で、送信可能帯
域判定部２３４により送信可能と判定された帯域のみを
判定を行い、他の帯域は送信不可としてもよい。A transmittable component selection unit 235 is inserted between the audio signal selection unit 602L and the sound source signal synthesis unit 7A.
The voice signal selection unit 602L determines and selects the output signal S1 of the microphone 1 as the voice signal of the speaker 215,
Further, of these band components selected and selected, the transmissible component selection unit 235 selects only those band components determined by the transmissible band determination unit 234 as transmissible bands and sends them to the sound source signal synthesis unit 7A. Therefore, the frequency component that is emitted from the speaker 211 and may cause howling is not transmitted to the transmission line 216, so that howling can be suppressed more reliably. As the transmittable component selection unit 235, the audio signal determination unit 601 may determine only the band that is determined to be transmittable by the transmittable band determination unit 234 and may not transmit other bands.

【００４０】遅延部２３２はスピーカ２１１とマイクロ
ホン１，２との間の音響信号の伝搬時間を考慮して、遅
延量が定められる。この遅延部２３２で行う遅延作用を
得る手段としては分岐部２３１と送信可能成分選択部２
３５との間のどの処理段の後に挿入してもよい。送信可
能帯域判定部２３４の後段に点線枠２３７として示すよ
うに挿入する場合は、データを蓄積する読み書き可能な
記録部を用い、その所要の遅延量に相当する時間の後、
読み出して送信可能成分選択部２３５へ供給するように
することもできる。要は前記伝搬時間を考慮して、送信
可能成分選択部２３５の制御を遅らせて行うように遅延
される手段を設ければよい。場合によってはこれら遅延
手段を省略することもできる。The delay unit 232 determines the delay amount in consideration of the propagation time of the acoustic signal between the speaker 211 and the microphones 1 and 2. The branching unit 231 and the transmittable component selecting unit 2 are means for obtaining the delaying action performed by the delaying unit 232.
It may be inserted after any processing stage between 35 and 35. When inserting as shown by the dotted line frame 237 in the subsequent stage of the transmittable band determination unit 234, a readable / writable recording unit that stores data is used, and after a time corresponding to the required delay amount,
It is also possible to read out and supply to the transmittable component selection unit 235. In short, in consideration of the propagation time, a means for delaying the control of the transmittable component selection unit 235 may be provided. In some cases, these delay means can be omitted.

【００４１】図１２の実施例ではハウリングの可能性が
ある成分を送信側（出力側）で遮断したが、受信側（入
力側）で遮断してもよい。その実施例の要部を図１３に
示す。伝送線２１２よりの受信信号は帯域分割部２４１
で複数の周波数帯域に分割される。この分割は帯域分割
部４（図２）の分割と同一とし、同一手法で行うことが
できる。この帯域分割された受信信号は周波数成分除去
部２４２に入力される。音声信号判定部６０１より得ら
れている、音声信号選択部６０２Ｌでマイクロホン１か
らの発話者２１５の音声成分を選択する制御信号が周波
数成分除去部２４２に入力され、音声信号選択部６０２
Ｌで選択しない、つまり伝送線２１６へ送信しない帯域
成分が、周波数成分除去部２４２で帯域分割された受信
信号から選択されて音響信号合成部２４３へ供給され、
二つで音響信号に合成されてスピーカ２１１へ供給され
る。音響信号合成部２４３は音源信号合成部７Ａと同様
の機能をもつものである。この構成によればスピーカ２
１１から放音される音響信号には、伝送線２１６へ送出
される周波数成分が除外されているため、ハウリングの
発生が抑圧される。In the embodiment shown in FIG. 12, components that may cause howling are blocked on the transmitting side (output side), but they may be blocked on the receiving side (input side). The essential part of the embodiment is shown in FIG. The received signal from the transmission line 212 is a band division unit 241.
Is divided into multiple frequency bands. This division is the same as the division of the band division unit 4 (FIG. 2) and can be performed by the same method. The band-divided reception signal is input to the frequency component removing unit 242. The control signal, which is obtained from the voice signal determination unit 601, for selecting the voice component of the speaker 215 from the microphone 1 by the voice signal selection unit 602L is input to the frequency component removal unit 242, and the voice signal selection unit 602.
A band component that is not selected by L, that is, a band component that is not transmitted to the transmission line 216 is selected from the reception signals band-divided by the frequency component removal unit 242 and supplied to the acoustic signal synthesis unit 243.
The two are combined into an audio signal and supplied to the speaker 211. The acoustic signal synthesizer 243 has the same function as the sound source signal synthesizer 7A. According to this configuration, the speaker 2
Since the frequency component transmitted to the transmission line 216 is excluded from the acoustic signal emitted from 11, the occurrence of howling is suppressed.

【００４２】図２の実施例で説明したように、帯域別チ
ャネル間時間差や帯域別チャネル間レベル差から、その
帯域成分が何れの音源信号に属するかを決定するしきい
値ΔＬth，Δτthは音源とマイクロホンとの相対位置に
より、好ましい値が異なる。従って、図１２中に示すよ
うにしきい値設定部２５１を設けて、音声信号判定部６
０１における判定基準、つまりしきい値ΔＬth，Δτth
を状況に応じて変更設定するようにすることが好まし
い。As described in the embodiment of FIG. 2, the thresholds ΔLth and Δτth for determining which sound source signal the band component belongs to based on the time difference between the bands and the level difference between the bands are the sound sources. The preferred value varies depending on the relative position between the and microphones. Therefore, as shown in FIG. 12, a threshold value setting unit 251 is provided, and the audio signal determination unit 6
01, that is, thresholds ΔLth and Δτth
Is preferably changed and set according to the situation.

【００４３】また、耐騒音性を高めるためには、基準値
設定部２５２を設けて、一定値以下のレベルの周波数成
分は無音化する無音化基準を設定して、音声信号選択部
６０２Ｌに送る。この結果、音声信号選択部６０２Ｌに
おいて、レベル差しきい値、位相差（時間差）しきい値
により選択されたマイクロホン１の収音信号の周波数成
分の中から、レベルが一定値以下の周波数成分は暗騒
音、空調騒音等の雑音成分と見なされて除去され、耐騒
音性が向上する。Further, in order to improve the noise resistance, a reference value setting unit 252 is provided, and a silencing standard for silencing frequency components having a level below a certain value is set and sent to the audio signal selection unit 602L. . As a result, in the audio signal selection unit 602L, among the frequency components of the sound pickup signal of the microphone 1 selected by the level difference threshold value and the phase difference (time difference) threshold value, the frequency component whose level is equal to or lower than the fixed value is dark. Noise components such as noise and air-conditioning noise are considered to be removed, and noise resistance is improved.

【００４４】ところで、ハウリングの発生を防止するに
は、基準値設定部２５２に一定値以上のレベルの周波数
成分を、その一定値以下に保持するハウリング防止基準
を追加し、音声信号選択部６０２Ｌに送る。この結果、
音声信号選択部６０２Ｌにおいて、レベル差しきい値と
位相差しきい値、あるいはこれに加えた上記無音化基準
により選択されたマイクロホン１の収音信号の周波数成
分の中から、レベルが一定値以上の周波数成分はその一
定値以下のレベルに補正される。この補正は一定値レベ
ル以上となることは瞬時的にかつたまにある場合はその
一定値レベルにクリップし、一定値レベル以上に比較的
頻繁になる場合は、ダイナミックレンジを圧縮すること
により行う。このようにすると、ハウリングの発生原因
となる音響結合量の増加を抑えることができ、ハウリン
グを防止することができる。By the way, in order to prevent the occurrence of howling, a howling prevention standard for holding a frequency component having a level of a certain value or more at a certain value or less is added to the reference value setting unit 252, and the audio signal selection unit 602L is added. send. As a result,
In the audio signal selection unit 602L, the level is equal to or more than a certain value among the frequency components of the sound pickup signal of the microphone 1 selected by the level difference threshold value and the phase difference threshold value, or the above-mentioned silence standard added thereto. The frequency component is corrected to a level below the fixed value. This correction is performed by clipping to a certain value level instantaneously and occasionally when it becomes above a certain value level, and by compressing the dynamic range when it becomes relatively frequent above the certain value level. By doing so, it is possible to suppress an increase in the acoustic coupling amount that causes howling and prevent howling.

【００４５】図１３中に示すように反響音を抑圧する構
成を付加することもできる。つまり出力端子ｔ_Aに、遅
延した回り込み信号を推定する回り込み信号推定部２６
１と、推定された遅延した回り込み信号を減ずる推定回
り込み信号減算部２６２を接続し、直接音と反響音との
伝達特性の性質を利用して、回り込み信号推定部２６１
において遅延した回り込み信号を推定して取り出す。こ
の推定処理には、例えば伝達特性の最小位相特性を考慮
した複素ケプストラム法を用いる。必要に応じて、直接
音と反響音との伝達特性は、インパルスレスポンス法で
測定することができる。推定部２６１で推定した遅延し
た回り込み信号を、回り込み信号除去部２６２で出力端
子ｔ_Aよりの分離された音源信号（発話者２１５の音声
信号）から除去して伝送線２１６へ送出する。回り込み
信号推定部２６１と回り込み信号除去部２６２による回
り込み信号の抑圧については、例えば、文献、昭和６２
年１１月２５日株式会社コロナ社発行、伊達玄訳「ディ
ジタル信号処理」に示されている。なお回り込み信号推
定部２６１と回り込み信号除去部２６２は、例えば１つ
のＤＳＰ（デジタルシグナルプロセッサ）で処理するこ
とができる。As shown in FIG. 13, a structure for suppressing reverberant sound can be added. That is, the wraparound signal estimator 26 that estimates the delayed wraparound signal is output to the output terminal t _A.
1 is connected to the estimated wraparound signal subtraction unit 262 that reduces the estimated delayed wraparound signal, and the wraparound signal estimation unit 261 is utilized by utilizing the property of the transfer characteristics of the direct sound and the reverberation sound.
The delayed wraparound signal is estimated and extracted. For this estimation processing, for example, the complex cepstrum method considering the minimum phase characteristic of the transfer characteristic is used. If necessary, the transfer characteristics of the direct sound and the reverberant sound can be measured by the impulse response method. The delayed wraparound signal estimated by the estimation unit 261 is removed by the wraparound signal removal unit 262 from the separated sound source signal (voice signal of the speaker 215) from the output terminal t _A and is sent to the transmission line 216. Regarding the suppression of the sneak signal by the sneak signal estimation unit 261 and the sneak signal removal unit 262, see, for example, literature, 1987.
It is shown in "Digital Signal Processing", translated by Gen Gen Date, published by Corona Publishing Co., Ltd. on November 25, 2013. The sneak signal estimation unit 261 and the sneak signal removal unit 262 can be processed by, for example, one DSP (digital signal processor).

【００４６】発話者２１５が一定の範囲しか移動しない
場合、その発話者２１５の側に設置したマイクロホン１
で収音された音声の周波数成分と、スピーカ２１１の側
に設置したマイクロホン２で収音された音声の周波数成
分とのレベル差や位相し差／到達時間差は、一定の範囲
内に限定される。したがって、しきい値設定部２５１に
判定基準範囲を設定し、そのレベル差範囲の位相差範囲
内のものに対してのみ信号処理し、範囲外のものは処理
の対象外とする。このようにすると、より高い精度でマ
イクロホン１の収音信号の中から、発話者２１５の発音
声が選択できる。When the speaker 215 moves only within a certain range, the microphone 1 installed on the speaker 215 side.
The level difference and the phase difference / arrival time difference between the frequency component of the sound collected by (1) and the frequency component of the sound collected by the microphone 2 installed on the side of the speaker 211 are limited within a certain range. . Therefore, the judgment reference range is set in the threshold value setting unit 251, signal processing is performed only for those within the phase difference range of the level difference range, and those outside the range are excluded from the processing target. In this way, the speech of the speaker 215 can be selected from the sound pickup signals of the microphone 1 with higher accuracy.

【００４７】なお、前記した場合と別の観点からは、ス
ピーカ２１１は固定であるため、発話者２１５の側のマ
イクロホン１で収音されたスピーカ２１１の音声の周波
数成分と、スピーカ２１１の側のマイクロホン２で収音
されたスピーカ２１１の音声の周波数成分とのレベル
差、位相差又は到達時間差は一定の範囲に限定される。
これらのレベル差、位相差／到達時間差の範囲は、音声
信号選択部６０２Ｌで破棄するための基準でもあり、こ
れらに基づいて音声信号選択部６０２Ｌでの選択を行う
ための判定基準をしきい値設定部２５１に設定すること
もできる。From the viewpoint different from the above case, since the speaker 211 is fixed, the frequency component of the voice of the speaker 211 picked up by the microphone 1 on the speaker 215 side and the speaker 211 side. The level difference, phase difference, or arrival time difference from the frequency component of the sound of the speaker 211 picked up by the microphone 2 is limited to a certain range.
The range of the level difference and the phase difference / arrival time difference is also a criterion for discarding by the audio signal selection unit 602L, and the threshold value is used as a criterion for the selection by the audio signal selection unit 602L based on these. It can also be set in the setting unit 251.

【００４８】このハウリング抑圧においても、３個以上
のマイクロホンを使用すれば、必要な周波数成分を選択
する機能をより高精度に達成することができる。さら
に、拡声系の音響システムの回り込み音抑圧形収音装置
にこの発明を適用したが、一般の電話用送受話装置にお
いても適用することができる。また、音声信号選択部６
０２Ｌで選択されるべき周波数成分は、マイクロホン１
で収音した音声信号の周波数成分の中の特定の周波数成
分（発話者２１５の音声）に限られるものではなく、状
況に応じて、例えば発話者２１５側に空調装置の吹き出
し口がある場合、マイクロホン２で収音した周波数成分
の中の発話者２１５の音声と判定された周波数成分を選
出し、あるいは騒音が大きな環境下では両マイクロホン
１，２で収音した周波数成分の中の発話者２１５の音声
と判定された周波数成分を選択することもできる。Even in this howling suppression, if three or more microphones are used, the function of selecting a necessary frequency component can be achieved with higher accuracy. Further, although the present invention is applied to the wraparound sound suppressing type sound collecting device of the sound system of the loudspeaking system, the present invention can also be applied to a general telephone transmitting / receiving device. Also, the audio signal selection unit 6
The frequency component to be selected in 02L is the microphone 1
It is not limited to a specific frequency component (voice of the speaker 215) in the frequency components of the voice signal collected in step S1, and depending on the situation, for example, when the speaker 215 has an outlet of an air conditioner, The frequency component determined to be the voice of the speaker 215 out of the frequency components picked up by the microphone 2 is selected, or the speaker 215 out of the frequency components picked up by both the microphones 1 and 2 in an environment with a large amount of noise. It is also possible to select the frequency component determined to be the voice.

【００４９】先に発話者が複数の場合にこれらを分離し
て、１又は２つの音声信号を、回り込み音を抑圧して送
信することにこの発明を適用できることを述べた。この
場合、複数の発話者の合成音声信号を互いに分離して得
るが、発話していない発話者に対応する合成音声信号を
抑圧乃至遮断すると、送信音声信号の品質が一層よくな
る。このためには発話者が発話しているか否かを検出す
るが、どの音源が発音していないかを検出して、対応合
成音声信号に対する抑圧信号を作成する。この抑圧信号
の作成方法を簡単に説明する。It has been described above that the present invention can be applied to the case where a plurality of speakers are separated, and one or two voice signals are transmitted while suppressing the wraparound sound. In this case, the synthesized speech signals of a plurality of speakers are obtained separately from each other, but if the synthesized speech signals corresponding to the speakers who are not speaking are suppressed or cut off, the quality of the transmission speech signal is further improved. For this purpose, it is detected whether or not the speaker is speaking, but which sound source is not sounding is detected, and a suppression signal for the corresponding synthesized speech signal is created. A method of creating this suppression signal will be briefly described.

【００５０】図１４に示すように、マイクロホンＭ１，
Ｍ２，Ｍ３は、例えば１辺が２０ｃｍの正三角形の頂点
の位置に配置されている。マイクロホンＭ１〜Ｍ３の指
向特性に基づいて空間が分割して設定され、その各分割
された空間を音源ゾーンと呼ぶ。全てのマイクロホンＭ
１〜Ｍ３が無指向で同じ特性を有する場合には、例えば
図１２に示すように、ゾーンＺ１〜Ｚ６のように６個に
分割される。つまり、各マイクロホンＭ１，Ｍ２，Ｍ３
と、その中心点Ｃp をそれぞれ通る直線により、中心点
Ｃp を中心に等角間隔で６分割された６つのゾーンＺ１
〜Ｚ６が形成される。音源ＡはゾーンＺ３に、音源Ｂは
ゾーンＺ４に位置している。つまり、１個の音源ゾーン
には１個の音源が属するよう、マイクロホンＭ１〜Ｍ３
の配置や特性に基づいて各音源ゾーンを決定する。As shown in FIG. 14, the microphones M1,
M2 and M3 are arranged at the positions of the vertices of an equilateral triangle having a side of 20 cm, for example. A space is divided and set based on the directional characteristics of the microphones M1 to M3, and each divided space is called a sound source zone. All microphones M
When 1 to M3 are omnidirectional and have the same characteristic, they are divided into six zones Z1 to Z6 as shown in FIG. 12, for example. That is, each microphone M1, M2, M3
And six zones Z1 divided into six at equal angular intervals by the straight line passing through the center point Cp.
~ Z6 is formed. The sound source A is located in the zone Z3 and the sound source B is located in the zone Z4. That is, the microphones M1 to M3 are arranged so that one sound source belongs to one sound source zone.
Each sound source zone is determined based on the arrangement and characteristics of.

【００５１】図１４において、帯域分割部４１は、マイ
クロホンＭ１で収音した第１チャネルの音響信号Ｓ１を
ｎ個の周波数帯域信号Ｓ１（ｆ１）〜Ｓ１（ｆｎ）に分
割し、分割部４２でマイクロホンＭ２で収音した第２チ
ャネルの音響信号Ｓ２をｎ個の周波数帯域信号Ｓ２（ｆ
１）〜Ｓ２（ｆｎ）に分割し、帯域分割部４３は、マイ
クロホンＭ３で収音した第３チャネルの音響信号Ｓ３を
ｎ個の周波数帯域信号Ｓ３（ｆ１）〜Ｓ３（ｆｎ）に分
割する。これら各帯域ｆ１〜ｆｎは帯域分割部４１〜４
３で共通であり、このような帯域分割は離散的フーリエ
変換器を利用することができる。In FIG. 14, the band dividing unit 41 divides the acoustic signal S1 of the first channel picked up by the microphone M1 into n frequency band signals S1 (f1) to S1 (fn), and the dividing unit 42. The sound signal S2 of the second channel picked up by the microphone M2 is converted into n frequency band signals S2 (f
1) to S2 (fn), and the band division unit 43 divides the acoustic signal S3 of the third channel picked up by the microphone M3 into n frequency band signals S3 (f1) to S3 (fn). Each of these bands f1 to fn is divided into band dividing units 41 to 4
3 is common, and such band division can utilize a discrete Fourier transformer.

【００５２】音源分離部８０は図２乃至図１１を参照し
て説明した手法を用いて音源信号を分離するものであ
る。ただし図１４ではマイクロホンが３つであるから、
この３つのチャネルの信号の各２つの組合せについて同
様な処理を行う。従って音源分離部８０内の帯域分割部
と帯域分割部４１〜４３を兼用することもできる。帯域
別レベル（パワー）検出部Ｓ１で帯域分割部４１で得ら
れた各帯域の信号Ｓ１（ｆ１）〜Ｓ１（ｆｎ）のレベル
（パワー）信号Ｐ（Ｓ１ｆ１）〜Ｐ（Ｓ１ｆｎ）が検出
され、同様に帯域別レベル検出部５２，５３でそれぞれ
帯域分割部４２，４３で得られた各帯域信号Ｓ２（ｆ
１）〜Ｓ２（ｆｎ），Ｓ３（ｆ１）〜Ｓ３（ｆｎ）の各
Ｐ（Ｓ２ｆ１）〜Ｐ（Ｓ２ｆｎ），Ｐ（Ｓ３ｆ１）〜Ｐ
（Ｓ３ｆｎ）がそれぞれ検出される。これら帯域別レベ
ル検出もフーリエ変換器で実現できる。つまり各チャネ
ル信号を離散的フーリエ変換によりスペクトルに分解
し、その各スペクトルの電力を求めればよい。従って、
各チャネル信号について、パワースペクトルを求め、そ
のパワースペクトルを帯域分割してもよい。各マイクロ
ホンＭ１〜Ｍ３の各チャネル信号を、帯域別レベル検出
部４００で各帯域に分割すると共にそのレベル（パワ
ー）を出力することになる。The sound source separation unit 80 separates the sound source signal by using the method described with reference to FIGS. However, in FIG. 14, since there are three microphones,
Similar processing is performed for each two combinations of the signals of the three channels. Therefore, the band dividing unit and the band dividing units 41 to 43 in the sound source separating unit 80 can be used together. The level (power) detection section S1 for each band detects the level (power) signals P (S1f1) to P (S1fn) of the signals S1 (f1) to S1 (fn) of each band obtained by the band division section 41, Similarly, the band-by-band level detection units 52 and 53 respectively obtain the band signals S2 (f) obtained by the band division units 42 and 43, respectively.
1) to S2 (fn), S3 (f1) to S3 (fn), P (S2f1) to P (S2fn), P (S3f1) to P
(S3fn) is detected. The level detection for each band can also be realized by the Fourier transformer. That is, each channel signal may be decomposed into spectra by discrete Fourier transform, and the power of each spectrum may be obtained. Therefore,
A power spectrum may be obtained for each channel signal and the power spectrum may be band-divided. Each channel signal of each of the microphones M1 to M3 is divided into each band by the band-specific level detection unit 400 and the level (power) is output.

【００５３】一方全帯域レベル検出部６１でマイクロホ
ンＭ１で収音された第１チャネルの音響信号Ｓ１の全周
波数成分のレベル（パワー）Ｐ（Ｓ１）が検出され、全
帯域レベル検出部６２，６３でそれぞれマイクロホンＭ
２，Ｍ３でそれぞれ収音された第２、第３チャネル２，
３の各音響信号Ｓ２，Ｓ３の全周波数成分のレベルＰ
（Ｓ２），Ｐ（Ｓ３）が検出される。On the other hand, the level (power) P (S1) of all frequency components of the sound signal S1 of the first channel picked up by the microphone M1 is detected by the all band level detecting section 61, and all band level detecting sections 62, 63 are detected. Each with a microphone M
2nd and 3rd channels 2, 2
Level P of all frequency components of each acoustic signal S2, S3 of 3
(S2) and P (S3) are detected.

【００５４】音源状態判定部７０では、コンピュータ処
理により、音響を発していない音源ゾーンを判定する。
まず、帯域別レベル検出部５０により得られる帯域別レ
ベルＰ（Ｓ１ｆ１）〜Ｐ（Ｓ１ｆｎ）、Ｐ（Ｓ２ｆ１）
〜Ｐ（Ｓ２ｆｎ）、Ｐ（Ｓ３ｆ１）〜Ｐ（Ｓ３ｆｎ）
を、同一の帯域の信号について相互に比較する。そして
各帯域ｆ１〜ｆｎ毎に、最も大きなレベルのチャネルを
特定する。The sound source state judging section 70 judges the sound source zone which does not emit sound by computer processing.
First, band-specific levels P (S1f1) to P (S1fn) and P (S2f1) obtained by the band-specific level detection unit 50.
~ P (S2fn), P (S3f1) ~ P (S3fn)
Are mutually compared for signals in the same band. Then, the channel of the highest level is specified for each of the bands f1 to fn.

【００５５】帯域分割の数ｎを所定数以上にすることに
より、前述したように、１つの帯域には１個の音源の音
響信号しか含まれないと見なせるようにすることができ
るので、同一帯域ｆｉのレベルＰ（Ｓ１ｆｉ），Ｐ（Ｓ
２ｆｉ），Ｐ（Ｓ３ｆｉは、同一音源からの音響のレベ
ルと見なすことができる。よって、第１〜第３チャネル
について同一の帯域のレベルＰ（Ｓ１ｆｉ），Ｐ（Ｓ２
ｆｉ），Ｐ（Ｓ３ｆｉ）に差があるときは、音源に最も
近いマイクロホンのチャネルの帯域のレベルが最も大き
くなる。By setting the number of band divisions n to be a predetermined number or more, it can be considered that one band includes only the sound signal of one sound source, as described above. fi levels P (S1fi), P (S
2fi) and P (S3fi) can be regarded as sound levels from the same sound source, and thus levels P (S1fi) and P (S2) in the same band for the first to third channels.
When there is a difference between fi) and P (S3fi), the level of the band of the channel of the microphone closest to the sound source becomes the highest.

【００５６】前記処理の結果、各帯域ｆ１〜ｆｎについ
て、最もレベルの大きなチャネルがそれぞれ割り当てら
れる。ｎ個の帯域中で第１〜第３各チャネルについて、
最もレベルが大きな帯域の合計数χ１，χ２，χ３を算
出する。この合計数の値が大きいチャネルのマイクロホ
ンほど、音源に近いとみなすことができる。合計数値が
例えば９０ｎ／１００以上程度であればそのチャネルの
マイクロホンに音源が近いと判定することができる。し
かし、最もレベルが大きい帯域の合計数が５３ｎ／１０
０、次に合計値が大きい値が４９ｎ／１００の場合はそ
のそれぞれの対応マイクロホンに音源が近いか明確では
ない。従って当該合計数が予め設定した基準値ＴｈＰ、
例えばｎ／３程度を越えたとき、当該合計数と対応する
チャネルのマイクロホンにその音源が最も近いと判定す
る。As a result of the above process, the highest level channel is assigned to each of the bands f1 to fn. For each of the first to third channels in the n bands,
The total number χ1, χ2, χ3 of the band with the highest level is calculated. It can be considered that the microphone of the channel having the larger value of the total number is closer to the sound source. If the total numerical value is, for example, about 90n / 100 or more, it can be determined that the sound source is close to the microphone of the channel. However, the total number of bands with the highest level is 53n / 10.
When the value of 0 and the next largest total value is 49n / 100, it is not clear whether the sound source is close to each corresponding microphone. Therefore, the total number is a preset reference value ThP,
For example, when it exceeds about n / 3, it is determined that the sound source is closest to the microphone of the channel corresponding to the total number.

【００５７】また、この音源状態判定部７０には、全帯
域レベル検出部６０で検出された各チャネルのレベルＰ
（Ｓ１）〜Ｐ（Ｓ３）も入力されていて、そのレベルの
全てが予め設定した基準値ＴｈＲ以下の場合には、何れ
のゾーンにも、音源がないと判定する。この音源状態判
定部７０による判定結果に基づき、制御信号を発生し
て、音源分離部８０で分割された音響信号Ａ，Ｂに対す
る抑圧を信号抑圧部９０で行う。つまり制御信号ＳＡｉ
により音響信号ＳＡを抑圧（減衰ないし削除）し、制御
信号ＳＢｉにより音響信号ＳＢを抑圧し、制御信号ＳＡ
Ｂｉにより両音響信号ＳＡ，ＳＢを抑圧する。例えば信
号抑圧部９０内に常閉スイッチ９Ａ，９Ｂが設けられ、
音源分離部８０の出力端子ｔ_A，ｔ_Bが常閉スイッチ９
Ａ，９Ｂを通じて、出力端子ｔ_A′，ｔ_B′に接続さ
れ、制御信号ＳＡｉによりスイッチ９Ａが開とされ、制
御信号ＳＢｉによりスイッチ９Ｂが開とされ、制御信号
ＳＡＢｉによりスイッチ９Ａ，９Ｂが共に開にされる。
当然のことであるが、音源分離部８０で行う分離処理す
るフレームの信号と、信号抑圧部９０での抑圧に用いる
制御信号を得るフレームの信号とは同一のものを用い
る。抑圧（制御）信号ＳＡｉ，ＳＢｉ，ＳＡＢｉの発生
についてわかり易く説明する。Further, the sound source state judging section 70 has a level P of each channel detected by the full band level detecting section 60.
If (S1) to P (S3) are also input and all the levels are equal to or lower than the preset reference value ThR, it is determined that there is no sound source in any zone. A control signal is generated based on the determination result by the sound source state determining unit 70, and the signal suppressing unit 90 suppresses the acoustic signals A and B divided by the sound source separating unit 80. That is, the control signal SAi
The acoustic signal SA is suppressed (attenuated or deleted) by the control signal SAi and the acoustic signal SB is suppressed by the control signal SBi.
Both acoustic signals SA and SB are suppressed by Bi. For example, normally closed switches 9A and 9B are provided in the signal suppressing unit 90,
The output terminals t _A and t _{B of the} sound source separation unit 80 are normally closed switches 9
The output terminals t _A ′ and t _B ′ are connected through A and 9B, the switch 9A is opened by the control signal SAi, the switch 9B is opened by the control signal SBi, and both the switches 9A and 9B are opened by the control signal SABi. Be opened.
As a matter of course, the same signal is used for the frame signal to be separated by the sound source separation unit 80 and the frame signal for obtaining the control signal used for suppression in the signal suppressing unit 90. The generation of the suppression (control) signals SAi, SBi, SABi will be described in an easy-to-understand manner.

【００５８】いま、図１５に示すように音源Ａ，Ｂが位
置している時マイクロホンＭ１〜Ｍ３を図に示したよう
に配置し、ゾーンＺ１〜Ｚ６を決定し、音源ＡとＢが別
個のゾーンＺ３，Ｚ４にそれぞれ位置するようにする。
この時、音源ＡのマイクロホンＭ１〜Ｍ３に対する距離
ＳＡ１，ＳＡ２，ＳＡ３は、ＳＡ２＜ＳＡ３＜ＳＡ１と
なる。また、音源Ｂの各マイクロホンＭ１〜Ｍ３に対す
る距離ＳＢ１，ＳＢ２，ＳＢ３は、ＳＢ３＜ＳＢ２＜Ｓ
Ｂ１となる。Now, as shown in FIG. 15, when the sound sources A and B are located, the microphones M1 to M3 are arranged as shown in the figure, the zones Z1 to Z6 are determined, and the sound sources A and B are separated. It should be located in each of the zones Z3 and Z4.
At this time, the distances SA1, SA2 and SA3 of the sound source A to the microphones M1 to M3 are SA2 <SA3 <SA1. The distances SB1, SB2, SB3 of the sound source B to the microphones M1 to M3 are SB3 <SB2 <S.
It becomes B1.

【００５９】全帯域レベル検出部６０の検出信号Ｐ（Ｓ
１）〜Ｐ（Ｓ３）のすべてが基準値ＴｈＲよりも小さい
とき、音源Ａ，Ｂは発音、例えば発話していないと見な
し、制御信号ＳＡＢｉにより、両音響信号ＳＡ，ＳＢを
抑圧する。このとき、出力音響信号ＳＡ，ＳＢは無音信
号となる（図１６の１０１，１０２）。音源Ａのみが発
音しているときは、その音響信号のすべての帯域の周波
数成分がマイクロホンＭ２へ一番大きな音圧レベル（パ
ワー）で到達するので、このマイクロホンＭ２のチャネ
ルの合計帯域数χ２が最も多くなる。The detection signal P (S
When all of 1) to P (S3) are smaller than the reference value ThR, it is considered that the sound sources A and B are not sounding, for example, not speaking, and both acoustic signals SA and SB are suppressed by the control signal SABi. At this time, the output acoustic signals SA and SB become silent signals (101 and 102 in FIG. 16). When only the sound source A is producing sound, the frequency components of all the bands of the acoustic signal reach the microphone M2 with the largest sound pressure level (power), so that the total number of bands χ2 of the channels of the microphone M2 is The most.

【００６０】また、音源Ｂのみが発音しているときは、
その音響信号のすべての帯域の周波数成分がマイクロホ
ンＭ３へ一番大きな音圧レベルで到達するので、このマ
イクロホンＭ３のチャネルの合計帯域数χ３が最も多く
なる。さらに、音源Ａ，Ｂが共に発音している場合に
は、音響信号が最も大きな音圧レベルで到達する帯域数
がマイクロホンＭ２とＭ３で拮抗する。Further, when only the sound source B is sounding,
Since the frequency components of all the bands of the acoustic signal reach the microphone M3 with the highest sound pressure level, the total number of bands χ3 of the channels of this microphone M3 is the largest. Further, when the sound sources A and B are both sounding, the number of bands reached by the acoustic signal at the highest sound pressure level is balanced by the microphones M2 and M3.

【００６１】したがって、前記した基準値ＴｈＰによ
り、音響信号があるマイクロホンへ最も大きな音圧レベ
ルで到達する合計帯域数が、当該基準値ＴｈＰを越えた
場合、当該マイクロホンが司るゾーンに音源が存在する
と判定することにより、発音している音源ゾーンを検出
することができる。上記の例では、音源Ａのみが発音し
ているときは、χ２のみが基準値ＴｈＰを越えて、発音
している音源が存在するのはマイクロホンＭ２が司るゾ
ーンＺ３であると検出されるので、制御信号ＳＢｉによ
り音声信号ＳＢを抑制して、音響信号ＳＡのみを出力さ
せる（図１６の１０３，１０４）。Therefore, if the total number of bands that reach the microphone with the highest sound pressure level by the reference value ThP exceeds the reference value ThP, the sound source exists in the zone controlled by the microphone. By making a determination, the sound source zone that is producing a sound can be detected. In the above example, when only the sound source A is sounding, only χ2 exceeds the reference value ThP, and it is detected that the sounding sound source exists in the zone Z3 controlled by the microphone M2. The audio signal SB is suppressed by the control signal SBi, and only the acoustic signal SA is output (103 and 104 in FIG. 16).

【００６２】さらに、音源Ａ，Ｂが共に発音していて、
χ２，χ３ともに基準値ＴｈＰを越えるときは、例えば
音源Ａに優先度を与えて、音源Ａのみが発音していると
処理することができる。図１６の処理手順はそのように
してある。また、χ２，χ３が共に基準値ＴｈＰに達し
ていない場合は、レベルＰ（Ｓ１）〜Ｐ（Ｓ３）が基準
値ＴｈＲを越えている限り、両音源Ａ，Ｂともに発音し
ていると判断し、制御信号ＳＡｉ，ＳＢｉ，ＳＡＢｉの
何れも出力せず、音声抑圧部９０では合成信号ＳＡ，Ｓ
Ｂに対する抑圧は行われない（図１６の１０７）。Furthermore, sound sources A and B are both sounding,
When both χ2 and χ3 exceed the reference value ThP, it is possible to give priority to the sound source A, for example, and process only the sound source A is sounding. The processing procedure of FIG. 16 is as such. When both χ2 and χ3 have not reached the reference value ThP, it is determined that both sound sources A and B are sounding as long as the levels P (S1) to P (S3) exceed the reference value ThR. , The control signals SAi, SBi, and SABi are not output, and the voice suppression unit 90 outputs the combined signals SA and S.
B is not suppressed (107 in FIG. 16).

【００６３】以上のようにして、音源分離部８０で分離
された音源信号ＳＡ，ＳＢは、音源状態判定部７０によ
って発音していないと判定された音源に対応するもの
が、信号抑圧部９０で抑圧され、不要音が抑圧されるよ
うになる。このような制御信号の生成は、帯域間到達時
間差を利用して検出することもできる。つまり図１４に
おいて帯域間レベル差検出部５１で、レベル差の代りに
到達時間差信号Ａｎ（Ｓ１ｆ１）〜Ａｎ（Ｓ１ｆｎ）を
検出し、同様に到達時間差信号Ａｎ（Ｓ２ｆ１）〜Ａｎ
（Ｓ２ｆｎ），Ａｎ（Ｓ３ｆ１）〜Ａｎ（Ｓ３ｆｎ）を
検出し、これらの到達時間差信号を得る処理は、例え
ば、フーリエ変換により各帯域の信号の位相（あるいは
群遅延）を算出し、同一の帯域ｆｉの信号Ｓ１（ｆ
ｉ），Ｓ２（ｆｉ），Ｓ３（ｆｉ）（ｉ＝１，２，…，
ｎ）の位相を相互に比較することで、同一音源信号の到
達時間差と対応した信号を得ることができる。この場合
も帯域分割部４０での分割は、１つの帯域には１つの音
源信号成分しか存在しないとみなせる程度に小さく行
う。As described above, the sound source signals SA and SB separated by the sound source separation unit 80 correspond to the sound source determined by the sound source state determination unit 70 as not being sounded by the signal suppression unit 90. It is suppressed, and unnecessary sound comes to be suppressed. The generation of such a control signal can also be detected by utilizing the arrival time difference between bands. That is, in FIG. 14, the inter-band level difference detection unit 51 detects arrival time difference signals An (S1f1) to An (S1fn) instead of the level difference, and similarly, arrives time difference signals An (S2f1) to An.
(S2fn), An (S3f1) to An (S3fn) are detected, and the process of obtaining the arrival time difference signals is performed, for example, by calculating the phase (or group delay) of the signals in each band by Fourier transform and calculating the same band. signal S1 of fi (f
i), S2 (fi), S3 (fi) (i = 1, 2, ...,
By comparing the phases of n) with each other, it is possible to obtain a signal corresponding to the arrival time difference of the same sound source signal. In this case as well, the division by the band division unit 40 is made so small that only one sound source signal component is present in one band.

【００６４】この到達時間差の表現方法は、例えば、マ
イクロホンＭ１〜Ｍ３のいずれかを基準にしてその基準
マイクロホンに対する到達時間差を０に設定しておけ
ば、他のマイクロホンに対する到達時間差はその基準マ
イクロホンに対して速く到達したか遅く到達したかで判
定できるので、正又は負の極性を付した数値で表すこと
ができる。この場合、基準マイクロホンを例えばＭ１と
すると、到達時間差信号Ａｎ（Ｓ１ｆ１）〜Ａｎ（Ｓ１
ｆｎ）は全て０となる。In this method of expressing the arrival time difference, for example, if one of the microphones M1 to M3 is set as a reference and the arrival time difference for the reference microphone is set to 0, the arrival time difference for the other microphones is set to the reference microphone. On the other hand, since it can be determined whether the arrival is fast or slow, it can be represented by a numerical value with a positive or negative polarity. In this case, assuming that the reference microphone is M1, for example, the arrival time difference signals An (S1f1) to An (S1
fn) is all 0.

【００６５】音源状態判定部７０では、到達時間差信号
Ａｎ（Ｓ１ｆ１）〜Ａｎ（Ｓ１ｆｎ），Ａｎ（Ｓ２ｆ
１）〜Ａｎ（Ｓ２ｆｎ），Ａｎ（Ｓ３ｆ１）〜Ａｎ（Ｓ
３ｆｎ）を、同一の帯域の信号について相互に比較す
る。これにより各帯域ｆ１〜ｆｎ毎に、最も信号が速く
到達するチャネルが決定できる。そこで、各チャネルに
ついて信号が最も速く到達すると判定された帯域の合計
数を算出して、それをチャネル間で比較する。この結
果、この合計帯域数の値が大きいチャネルのマイクロホ
ンほど、音源に近いとみなすことができる。そして、あ
るチャネルについて、当該合計帯域数が予め設定した基
準値ＴｈＰを越えたとき、当該のチャネルのマイクロホ
ンが司るゾーンに音源があると判定する。In the sound source state judging section 70, the arrival time difference signals An (S1f1) to An (S1fn), An (S2f).
1) to An (S2fn), An (S3f1) to An (S
3fn) are compared with each other for signals in the same band. As a result, the channel through which the signal arrives fastest can be determined for each of the bands f1 to fn. Therefore, the total number of bands for which it is determined that the signal arrives fastest for each channel is calculated and compared between the channels. As a result, it can be considered that the microphone of the channel having the larger value of the total number of bands is closer to the sound source. Then, when the total number of bands for a channel exceeds a preset reference value ThP, it is determined that a sound source exists in a zone controlled by the microphone of the channel.

【００６６】いま図１５に示したように音源Ａ，Ｂに対
し、マイクロホンＭ１〜Ｍ３を配置したとする。またマ
イクロホンＭ１のチャネルに対する前記した合計帯域数
をχ１、マイクロホンＭ２，Ｍ３の各チャネルに対する
合計帯域数をそれぞれχ２，χ３とする。この場合も図
１６に示した処理手順と同様にすればよい。即ち、ま
ず、全帯域レベル検出部６０の検出信号Ｐ（Ｓ１）〜Ｐ
（Ｓ３）のすべてが基準値ＴｈＲよりも小さいとき（１
０１）、音源Ａ，Ｂは発音していないと見なし、制御信
号ＳＡＢｉを生成して（１０２）、両音源信号ＳＡ，Ｓ
Ｂを抑圧する。このとき、出力信号ＳＡ′，ＳＢ′は無
音信号となる。Now, suppose that the microphones M1 to M3 are arranged for the sound sources A and B as shown in FIG. The total number of bands for the channels of the microphone M1 is χ1, and the total number of bands for the channels of the microphones M2 and M3 is χ2, χ3, respectively. Also in this case, the processing procedure shown in FIG. 16 may be performed. That is, first, the detection signals P (S1) to P (S1)
When all of (S3) are smaller than the reference value ThR (1
01), it is considered that the sound sources A and B are not sounding, and the control signal SABi is generated (102) to generate both sound source signals SA and S.
Suppress B. At this time, the output signals SA 'and SB' are silent signals.

【００６７】音源Ａのみが発音しているときは、その音
源信号のすべての帯域の周波数成分がマイクロホンＭ２
へ一番速く到達するので、このマイクロホンＭ２のチャ
ネルの合計帯域数χ２が最も多くなる。また、音源Ｂの
みが発音しているときは、その音源信号のすべての帯域
の周波数成分がマイクロホンＭ３へ一番速く到達するの
で、このマイクロホンＭ３のチャネルの合計帯域数χ３
が最も多くなる。When only the sound source A is producing sound, the frequency components of all the bands of the sound source signal are microphone M2.
The total number of bands χ2 of the channels of this microphone M2 is the largest, since Further, when only the sound source B is sounding, the frequency components of all the bands of the sound source signal reach the microphone M3 fastest, so the total number of bands χ3 of the channels of this microphone M3.
Is the most.

【００６８】さらに、音源Ａ，Ｂが共に発音している場
合には、音源信号が最も速く到達する帯域数がマイクロ
ホンＭ２とＭ３で拮抗する。したがって、前記した基準
値ＴｈＰにより、音源信号があるマイクロホンへ最も速
く到達する合計帯域数が、当該設定値ＴｈＰを越えた場
合、当該マイクロホンが司るゾーンに音源が存在し、そ
の音源が発音していると判定する。Further, when the sound sources A and B are both sounding, the number of bands in which the sound source signal reaches the earliest is equal in the microphones M2 and M3. Therefore, when the total number of bands that reaches the microphone with the sound source signal fastest with the reference value ThP exceeds the set value ThP, the sound source exists in the zone controlled by the microphone, and the sound source emits sound. Determine that

【００６９】上記の例では、音源Ａのみが発音している
ときは、χ２のみが基準値ＴｈＰを越えて（図１６の１
０３）、音響を発生している音源が存在するのはマイク
ロホンＭ２が司るゾーンＺ３であると検出されるので、
制御信号ＳＢｉが生成され（１０４）、音響信号ＳＢが
抑制され、信号ＳＡのみが出力される。また、音源Ｂの
みが発音しているときは、χ３のみが基準値ＴｈＰを越
え（１０５）、音を発している音源が存在するのは、マ
イクロホンＭ３が司るゾーンＺ４であると検出されるの
で、制御信号ＳＡｉが生成され（１０６）信号ＳＡが抑
制されて、信号ＳＢのみが出力される。In the above example, when only the sound source A is sounding, only χ2 exceeds the reference value ThP (1 in FIG. 16).
03), it is detected that the sound source generating the sound exists in the zone Z3 controlled by the microphone M2.
The control signal SBi is generated (104), the acoustic signal SB is suppressed, and only the signal SA is output. Further, when only the sound source B is producing sound, only χ3 exceeds the reference value ThP (105), and it is detected that the sound source producing sound exists in the zone Z4 controlled by the microphone M3. , The control signal SAi is generated (106), the signal SA is suppressed, and only the signal SB is output.

【００７０】この例ではＴｈＰは例えばｎ／３程度に設
定され、音源Ａ，Ｂが共に発音していて、χ２，χ３と
もに基準値ＴｈＰを越えることがある。この場合は図１
３の処理手順に示すように一方の音源、この例ではＡを
優先させ、音源Ａへ分離信号のみを出力させることもで
きる。また、χ２，χ３が共に基準値ＴｈＰに達してい
ない場合は、レベルＰ（Ｓ１）〜Ｐ（Ｓ３）が基準値Ｔ
ｈＲを越えている限り、両音源Ａ，Ｂともに発音してい
ると判断し、制御信号ＳＡｉ，ＳＢｉ，ＳＡＢｉは出力
せず（図１６の１０７）音声抑圧部９０では音声信号Ｓ
Ａ，ＳＢに対する抑圧は行われない。In this example, ThP is set to, for example, about n / 3, both sound sources A and B are sounding, and both χ2 and χ3 may exceed the reference value ThP. In this case,
As shown in the processing procedure of No. 3, one sound source, A in this example, can be prioritized and only the separated signal can be output to the sound source A. If both χ2 and χ3 do not reach the reference value ThP, the levels P (S1) to P (S3) are set to the reference value T.
As long as hR is exceeded, both sound sources A and B are determined to be sounding, and control signals SAi, SBi, and SABi are not output (107 in FIG. 16).
A and SB are not suppressed.

【００７１】このように、発音していない合成音信号を
抑圧あるいは無音化する方法を回り込み抑圧収音装置に
適用した例の機能構成図を図１７に、図２、図１２、図
１４と対応する部分に同一符号を付けて示す。つまりこ
の場合は、マイクロホン１，２よりの各チャネル信号は
帯域分割部４で複数の帯域に分割されて音声信号選択部
６０２Ｌ、帯域別チャネル間時間差／レベル差検出部
５、帯域別レベル／時間差検出部５０に供給される。両
マイクロホン１，２の出力はチャネル間時間差／レベル
差検出部３へも供給され、そのチャネル間時間差又はレ
ベル差は帯域別チャネル間時間差／レベル差検出部５と
音声信号判定部６０１とへ供給され、またマイクロホン
１，２の各出力のレベルが音源状態判定部７０へ供給さ
れる。A functional block diagram of an example in which the method of suppressing or muting a synthetic sound signal which is not sounded is applied to the suppression sound collecting apparatus as described above corresponds to FIG. 17 and corresponds to FIG. 2, FIG. 12 and FIG. The same symbols are given to the parts to be shown. In other words, in this case, the channel signals from the microphones 1 and 2 are divided into a plurality of bands by the band division unit 4, and the audio signal selection unit 602L, the band time difference between channels / level difference detection unit 5, the band level / time difference. It is supplied to the detection unit 50. The outputs of both microphones 1 and 2 are also supplied to the inter-channel time difference / level difference detection section 3, and the inter-channel time difference or level difference is supplied to the band-based inter-channel time difference / level difference detection section 5 and the audio signal determination section 601. The levels of the outputs of the microphones 1 and 2 are supplied to the sound source state determination unit 70.

【００７２】帯域別チャネル間時間差／レベル差検出部
５の出力は音声信号判定部６０１へ供給され、前述した
ように、帯域ごとに何れの音源成分かの判定がなされ、
この判定結果にもとづき、音声信号選択部６０２Ｌで特
定の音源の音響信号成分、この例では１人の話者の音声
の成分のみが選択されて音源信号合成部７へ供給され
る。一方、帯域別レベル／時間差検出部５０で、各帯域
のレベル又は到達時間差が検出され、これら検出出力は
音源状態判定部７０で前述したように発音している又は
していない音源を検出して、発音していない合成音源信
号を信号抑圧部９０で抑圧する。The output of the time difference / level difference detecting unit 5 for each band for each band is supplied to the audio signal judging unit 601, and as described above, which sound source component is judged for each band.
Based on this determination result, the sound signal selection unit 602L selects only the sound signal component of a specific sound source, in this example, the sound component of one speaker, and supplies it to the sound source signal synthesis unit 7. On the other hand, the band-by-band level / time difference detection unit 50 detects the level of each band or the arrival time difference, and these detection outputs are detected by the sound source state determination unit 70 as described above. The signal suppressing unit 90 suppresses the synthesized sound source signal that is not sounded.

【００７３】図１３、図１４に示した回り込み抑圧収音
装置にも同様に発音していない合成音源信号を抑圧する
手法を適用することができる。図２中の帯域分割部４、
図１４中の各帯域分割部４０、図１２中の帯域分割部２
３３、図１３中の帯域分割部２４１における各周波数帯
域の分割は必ずしも同一とする必要はない。要求される
精度に応じて、これらの分割数を互いに異ならせてもよ
い。図２中で帯域間レベル差を用いる場合の帯域分割部
４、図１２中の帯域分割部２３３、図１４中の帯域分割
部４０はそれぞれその後の処理のために、その入力信号
のパワースペクトルを先ず求め、その後、複数の周波数
帯域に分割してもよい。Similarly, the method of suppressing the synthesized sound source signal which is not sounding can be applied to the wraparound suppression sound collecting device shown in FIGS. 13 and 14. The band division unit 4 in FIG.
Each band division unit 40 in FIG. 14 and band division unit 2 in FIG.
33, the division of each frequency band in the band division unit 241 in FIG. 13 does not necessarily have to be the same. These division numbers may be different from each other depending on the required accuracy. The band division unit 4 when using the level difference between bands in FIG. 2, the band division unit 233 in FIG. 12, and the band division unit 40 in FIG. 14 respectively obtain the power spectrum of the input signal for subsequent processing. It may be obtained first and then divided into a plurality of frequency bands.

【００７４】[0074]

【発明の効果】以上述べたように、この発明によれば、
複数のマイクロホンの出力信号を十分に狭い複数の帯域
に分割し、その各帯域ごとの音響信号のパラメータ値を
検出し、同一帯域間でこれらの差を検出し、そのパラメ
ータ値差を、しきい値と比較して、発話者の音声信号
を、他の音響信号から正しく分離することができ、ハウ
リングの発生を、比較的簡単な構成で十分抑圧すること
ができる。しかも音響の劣化も少ない。As described above, according to the present invention,
The output signals of multiple microphones are divided into multiple sufficiently narrow bands, the parameter values of the acoustic signal for each band are detected, and the differences between these bands are detected, and the parameter value difference is determined by the threshold value. Compared with the value, the voice signal of the speaker can be correctly separated from other acoustic signals, and the occurrence of howling can be sufficiently suppressed with a relatively simple configuration. Moreover, there is little deterioration of sound.

【００７５】また受信信号を、帯域内のそのレベル（パ
ワー）が十分無視できる程度の帯域が存在する程度に十
分狭い複数の帯域に分割し、この信号を無視できる帯域
のみ、分離抽出した音声信号の成分を取出して、音声合
成し、又は帯域分割された受信信号から、音声合成して
送信する帯域の成分を除去して、その除去された分割帯
域受信信号を音声合成して電気音響変換器へ供給するこ
とにより、ハウリングの発生を一層、確実に抑えること
ができる。Further, the received signal is divided into a plurality of bands which are sufficiently narrow so that there is a band whose level (power) in the band is sufficiently negligible, and this signal is separated and extracted only in the negligible band. Component is extracted and then subjected to voice synthesis, or a band-divided reception signal is subjected to voice synthesis to remove a component in a band to be transmitted, and the removed divided band reception signal is subjected to voice synthesis to perform an electroacoustic transducer. By supplying to, it is possible to further reliably suppress the occurrence of howling.

[Brief description of drawings]

【図１】この発明装置の主要構成を示すブロック図。FIG. 1 is a block diagram showing the main configuration of the device of the present invention.

【図２】この発明に用いられる音源分離部の実施例の機
能構成を示すブロック図。FIG. 2 is a block diagram showing a functional configuration of an embodiment of a sound source separation section used in the present invention.

【図３】この発明に用いられる音源分離方法の実施例の
処理手順を示す流れ図。FIG. 3 is a flowchart showing a processing procedure of an embodiment of a sound source separation method used in the present invention.

【図４】図３中のチャネル間時間差Δτ₁，Δτ₂を求
める処理手順の例を示す流れ図。FIG. 4 is a flowchart showing an example of a processing procedure for obtaining time differences Δτ ₁ and Δτ ₂ between channels in FIG.

【図５】Ａ，Ｂはそれぞれ二つの音源信号のスペクトル
の例を示す図である。5A and 5B are diagrams showing examples of spectra of two sound source signals, respectively.

【図６】音源分離方法で、チャネル間レベル差を利用し
て音源分離を行う実施例の処理手順を示す流れ図。FIG. 6 is a flowchart showing a processing procedure of an embodiment in which sound source separation is performed by using a level difference between channels in the sound source separation method.

【図７】音源分離方法で、チャネル間レベル差と、チャ
ネル間到達時間差を利用する実施例の処理手順の一部を
示す流れ図。FIG. 7 is a flowchart showing a part of a processing procedure of an embodiment that uses a level difference between channels and a time difference between arrivals between channels in a sound source separation method.

【図８】図７中のステップＳ０８の続きを示す流れ図。FIG. 8 is a flowchart showing a continuation of step S08 in FIG. 7.

【図９】図７中のステップＳ０９の続きを示す流れ図。9 is a flowchart showing a continuation of step S09 in FIG. 7.

【図１０】図７中のステップＳ１０、図７、図８中のス
テップＳ２０，Ｓ３０の続きを示す流れ図。10 is a flowchart showing a continuation of step S10 in FIG. 7 and steps S20 and S30 in FIG. 7 and FIG.

【図１１】周波数帯域が異なる音源信号を分離する実施
例の機能構成を示すブロック図。FIG. 11 is a block diagram showing a functional configuration of an embodiment for separating sound source signals having different frequency bands.

【図１２】この発明の受話装置の実施例の機能構成を示
すブロック図。FIG. 12 is a block diagram showing a functional configuration of an embodiment of a receiver according to the invention.

【図１３】その他の実施例の機能構成の一部を示すブロ
ック図。FIG. 13 is a block diagram showing a part of the functional configuration of another embodiment.

【図１４】レベル差を利用して不要音源信号を抑圧する
構成を付加した音源分離部の実施例の機能構成を示すブ
ロック図。FIG. 14 is a block diagram showing a functional configuration of an embodiment of a sound source separation unit to which a configuration for suppressing unnecessary sound source signals by using a level difference is added.

【図１５】３つのマイクロホンとその受けもつゾーン
と、２つの音源の配置例を示す図。FIG. 15 is a diagram showing an arrangement example of three microphones, zones which are responsible for the microphones, and two sound sources.

【図１６】発音している音源が１つの場合の音源ゾーン
の検出と、抑圧制御信号の生成処理手順の例を示す流れ
図。FIG. 16 is a flow chart showing an example of a sound source zone detection and a suppression control signal generation processing procedure in the case where there is one sound source that is sounding.

【図１７】この発明の更に他の実施例の機能構成を示す
ブロック図。FIG. 17 is a block diagram showing a functional configuration of still another embodiment of the present invention.

───────────────────────────────────────────────────── フロントページの続き (72)発明者松井弘行東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (56)参考文献特開平８−84392（ＪＰ，Ａ) 特開平６−292292（ＪＰ，Ａ) 特開平５−199590（ＪＰ，Ａ) 特開昭59−161995（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04R 3/04 G10K 15/00 G10L 13/00 H04S 7/00 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Hiroyuki Matsui Inventor Hiroyuki Matsui 3-19-2 Nishishinjuku, Shinjuku-ku, Tokyo Nihon Telegraph and Telephone Corporation (56) Reference JP-A-8-84392 (JP, A) Kaihei 6-292292 (JP, A) JP 5-199590 (JP, A) JP 59-161995 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) H04R 3 / 04 G10K 15/00 G10L 13/00 H04S 7/00

Claims

(57) [Claims]

1. Output channel signals of a plurality of microphones separated from each other are divided into a plurality of frequency bands, respectively.
And, of these divider has been each channel signal for each band, a band specific parameter value detection step of detecting Rupa parameter value will change due to the position of the plurality of microphones, the detection for each same band A parameter value difference detection process for detecting the difference between the channel values of the detected parameter values, and using the detected parameter value differences, based on a preset threshold value , a sound signal is generated from the band-divided channel signal . A voice signal selection process of selecting the voice signal component in the band unit, a voice synthesis process of synthesizing the voice signal component of the selected band into a voice signal over the entire band, and a plurality of the channel signals described above. Divide into bands and divide these
Multiple microphones for each channel signal for each specified band
Detects parameter values that change due to the position of the Rohon
Second band-specific parameter value detection process and each band detected in the second band-specific parameter value detection process
Comparing parameter values by region between channels for the same band
Based on the result of the detection, a speaker who is not speaking is detected
The sound source state determination process and the speaker test that does not speak in the sound source state determination process
The voice signal synthesized in the voice synthesis process by the output signal
Of the above, the synthesized signal corresponding to the speaker not speaking
Of the speech signal synthesized in the speech synthesis step and the signal suppression step for suppressing
Receiving an uncompressed synthetic speech signal
Method.

2. Each output of two microphones separated from each other.
Obtaining the cross-correlation of the force channel signals L (f) and R (f)
Acoustic signals from each sound source A and B reach each microphone
Time difference (called time difference between channels) Δτ ₁ , Δ
The process of obtaining τ ₂ and the channel signals L (f) and R (f) are discretely divided.
Fourier transform, and transform each frequency band signal into
Bandwidth of the extent that only the signal component of the square of the sound source is present mainly
Divide by width to obtain band signals L (fi) and R (fi)
Process, i = 1, ..., N, n is the number of divided bands, and both signals L (f
i), R (fi) phase difference Δφ _i , and frequency fi
_{Te, Δτ 1 - {(Δφi /} (2πfi)) + (ki1 / fi)} = ε i 1 Δτ 2 - {(Δφi / (2πfi)) + (ki2 / fi)} = ε i 2 ε i 1, Integers ki1 and ki2 are set so that ε _i 2 is minimized.
Between the channels with the smaller minimum values ε _i 1 and ε _i 2
The time difference Δτ _j (j = 1, 2) is calculated as the channel of the band i.
The process of setting the inter-time difference Δτ _ij and the signal L (fi) of the band i in which the time difference Δτ _ij is Δτ ₁
The signal R (fi) is selected as a voice signal component in band units.
And the audio signal component L of the selected band i in the above bands 1 to n.
(Fi) is synthesized and inverse Fourier transformed to produce a voice signal
A process of synthesizing a receiving method and a step of outputting the synthesized speech signal.

3. Each output of a plurality of microphones separated from each other.
Level difference between input channel signals ΔL (level difference between channels
, And each output channel signal is divided into n frequency bands.
The process of dividing into different channel signals, and between the different channel signals for each divided same band
Level difference (called level difference for each band) ΔLi (i = 1,
, N, where n is the number of divided bands), the sign (+ or −) of the inter-channel level difference ΔL,
Find the number of bands that match the sign of the level difference ΔLi for each band.
And the process of determining whether the number of matching bands is greater than or equal to a predetermined value
And the level difference between channels ΔL
Is positive, the corresponding one channel signal
And the corresponding one channel if the above judgment is not more than a predetermined value.
All of the above-mentioned band-specific level differences ΔLi in the signal
Of the band-specific channel signal of
And the sound signal component of these selected bands over the entire band.
Receiving with the process of synthesizing the voice signal, and a step of outputting the synthesized speech signal
Method.

4. Each output of a plurality of microphones separated from each other.
The microphone of the acoustic signal from the sound source
Time difference to reach the channel (called time difference between channels)
And the step of detecting the output channel signal is divided into a plurality of frequency bands,
Time of each output channel signal for each of these divided bands
Difference (called channel time difference between channels) and level difference (bandwidth)
The process of obtaining the level difference between different channels)
And the divided time based on the time difference between the channels.
Output channel signal with three frequencies, low, mid and high
The process of dividing into regions and the inter-channel time for each band in the above low frequency region
The difference is used to set the band based on a preset threshold.
Bandwidth of audio signal component from band-divided channel signal
In the frequency range of the above mid-range,
It is judged whether the bell difference is positive and based on the judgment result.
Calculate the time difference between the channels for each band
Threshold preset using time difference between channels for each band
Depending on the value, the audio from that band-split channel signal
The signal components selected in band units, the frequency range of the high band, Les between the respective band-by-band channel
Bell difference is used based on a preset threshold.
The audio signal component from the band-divided channel signal of
The process of selecting in units and the audio signal components of these selected bands over the entire band
Receiving method comprising the steps of synthesizing a speech signal, and a step of outputting the synthesized speech signal.

5. The method according to claim 1, 3 or 4.
Method, dividing each output channel signal into multiple frequency bands
Is mainly the signal component of one sound source.
Receiving method characterized by having a band as narrow as possible
Law.

6. Each output of two microphones separated from each other.
Obtaining the cross-correlation of the force channel signals L (f) and R (f)
Acoustic signals from each sound source A and B reach each microphone
Time difference (called time difference between channels) Δτ ₁ , Δ
The inter-channel time difference detecting means for obtaining τ ₂ and the channel signals L (f) and R (f) are respectively separated.
Fourier transform, and transform each frequency band signal into
Band in which only the signal component of one sound source mainly exists
Divide by width to obtain band signals L (fi) and R (fi)
Band dividing means, i = 1, ..., N, n is the number of divided bands, and both signals L (f
i), R (fi) phase difference Δφ _i , and frequency fi
_{Te, Δτ 1 - {(Δφi /} (2πfi)) + (ki1 / fi)} = ε i 1 Δτ 2 - {(Δφi / (2πfi)) + (ki2 / fi)} = ε i 2 ε i 1, Integers ki1 and ki2 are set so that ε _i 2 is minimized.
Between the channels with the smaller minimum values ε _i 1 and ε _i 2
The time difference Δτ _j (j = 1, 2) is calculated as the channel of the band i.
Inter- channel time difference detection means for each band with inter-time difference Δτ _ij
And a signal L (fi) in the band i having a time difference Δτ _ij of Δτ ₁
The signal R (fi) is selected as a voice signal component in band units.
And a voice signal selecting means for blocking the voice signal component L of the band i selected from the above bands 1 to n.
(Fi) is synthesized and inverse Fourier transformed to produce a voice signal
Synthesizer for synthesizing speech and outputting the synthesized speech signal
A receiving device having means .

7. Each output of a plurality of microphones separated from each other.
Level difference between input channel signals ΔL (level difference between channels
Inter-channel level difference detection means for obtaining each of the output channel signals and n frequency bands for each of the output channel signals.
Band splitting means for splitting into separate channel signals, and between the separate channel signals for each of the same split bands
Level difference (called per-band level difference) ΔLi (i = 1,
, N, where n is the number of divided bands)
Output means, the sign (+ or −) of the level difference ΔL between the channels, and
Find the number of bands that match the sign of the level difference ΔLi for each band.
Means for determining whether or not the number of matching bands is greater than or equal to a predetermined value
The determining means and the inter-channel level difference ΔL if the above determination is a predetermined value or more
Is positive, the corresponding one channel signal
Means for outputting as a single channel
All of the above-mentioned band-specific level differences ΔLi in the signal
Sound that selects the channel signal for each band as the audio signal component
The voice signal selection means and the sound signal components of these selected bands
Receiving with a voice synthesizing means for synthesizing the voice signal, and means for outputting the synthesized speech signal
apparatus.

8. Each output of a plurality of microphones separated from each other.
The microphone of the acoustic signal from the sound source
Time difference to reach the channel (called time difference between channels)
Inter-channel time difference detection means for detecting, and dividing each output channel signal into a plurality of frequency bands,
Time of each output channel signal for each of these divided bands
Difference (called channel time difference between channels) and level difference (bandwidth)
For each band to obtain the level difference between different channels)
Inter-channel level difference detection means and each of the divided signals based on the inter-channel time difference.
Output channel signal with three frequencies, low, mid and high
The frequency domain dividing means for dividing into regions and the inter-channel time for each band in the above low frequency region
The difference is used to set the band based on a preset threshold.
Bandwidth of audio signal component from band-divided channel signal
In the frequency range of the above mid-range,
It is judged whether the bell difference is positive and based on the judgment result.
Calculate the time difference between the channels for each band
Threshold preset using time difference between channels for each band
Based on the value, the sound from the band-divided channel signals of their
The signal components selected in band units, the frequency range of the high band, Les between the respective band-by-band channel
Bell difference is used based on a preset threshold.
The audio signal component from the band-divided channel signal of
An audio signal selection means that selects in units and the audio signal components of these selected bands over the entire band
A receiving device having a voice synthesizing unit for synthesizing a voice signal and a unit for outputting the synthesized voice signal .

9. A device according to claim 1, 3 or 4.
, The band component of each output channel signal into multiple frequency bands.
In the splitting means, the signal of each band is mainly the signal of one sound source.
Being selected in a narrow band so that only the components are included
Characteristic receiver device.

10. The receiver according to any one of claims 1 to 5.
A professional for having a computer execute each step of the speaking method.
Computer-readable recording medium that records grams
body.