JP3484112B2

JP3484112B2 - Noise component suppression processing apparatus and noise component suppression processing method

Info

Publication number: JP3484112B2
Application number: JP27330699A
Authority: JP
Inventors: 博史金澤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-09-27
Filing date: 1999-09-27
Publication date: 2004-01-06
Anticipated expiration: 2019-09-27
Also published as: JP2001100800A

Abstract

PROBLEM TO BE SOLVED: To perform noise suppression processing which has a small calculation volume and eliminates an unexpected noise. SOLUTION: A device is provided with a means 12 which performs frequency (f) analysis of each of reception position-classified sound signals obtained by detecting sounds in plural positions to obtain channel (ch)-classified (f) components, a first beam form processing means (B1) 13 which suppresses a noise (N) in the direction of a speaker to obtain an objective sound (0a) component by F processing based on a filter (F) coefficient which reduces sensitivity in directions other than the desired direction with respect to the (f) component of each channel, a second beam form processing means (B2) 16 which suppresses the voice of the speaker to obtain an N component by F processing which reduces sensitivity in directions other than the desired direction with respect to the (f) component of each channel obtained by the means 12, estimating means 17 and 18 which estimate the direction of N by the F coefficient of B1 and estimates the direction of the voice 0a by that of B2, means 14 and 15 which correct the coming direction of 0a of an input object 0j and that of N of 0j in B1 and B2 in accordance with estimated directions respectively, a means 30 for spectrum subtraction(SS) processing based on outputs of B1 and B2, a means 40 which obtains a directivity index D according with the time difference and the amplitude difference of coming sounds from the output of the means 12, and a means 50 for SS processing control based on D and the direction of 0a.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は複数のマイクロホン
を用いて雑音を抑圧し、目的の音声を取り出す雑音成分
抑圧装置および雑音成分抑圧方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise component suppressing apparatus and a noise component suppressing method for suppressing noise by using a plurality of microphones and extracting a target voice.

【０００２】[0002]

【従来の技術】環境下には種々の雑音源があることか
ら、マイクロホンで音声信号を取り込む場合において
も、周囲から紛れ込む雑音を避けることは難しい。しか
し、雑音が混入した音声信号を再生すると、目的の音声
が聴き辛いものとなるから、雑音成分の低減処理が必要
となる。2. Description of the Related Art Since there are various noise sources in the environment, it is difficult to avoid noise sneaking in from the surroundings even when capturing a voice signal with a microphone. However, when a voice signal mixed with noise is reproduced, the target voice becomes difficult to hear, and therefore noise component reduction processing is necessary.

【０００３】ところで、音声に紛れる雑音の低減処理技
術として、従来より知られているものに、複数のマイク
ロホンを用いて雑音を抑圧する技術がある。そして、こ
のマイクロホン処理技術は、音声認識装置やテレビ会議
装置などの音声入力を目的として従来から多くの研究者
によって技術開発に力が注がれている。中でも、少ない
マイクロホン数で大きな効果が得られる適応ビームフォ
ーマ処理技術を利用したマイクロホンアレイに関して
は、文献１（電子情報通信学会編：音響システムとデジ
タル処理）あるいは文献２（Heykin著；Adaptive Filt
er Theory（Plentice Hall））に述べられているよう
に、一般化サイドロープキャンセラ（ＧＳＣ）、フロス
ト型ビームフォーマ、参照信号法など、種々の方法が知
られている。By the way, as a technique for reducing noise mixed in voice, there is a technique known in the prior art for suppressing noise by using a plurality of microphones. Further, this microphone processing technology has been focused on technical development by many researchers for the purpose of inputting voice from a voice recognition device or a video conference device. Above all, regarding a microphone array using an adaptive beamformer processing technology that can obtain a large effect with a small number of microphones, refer to Reference 1 (Electronic Information and Communication Society: Audio Systems and Digital Processing) or Reference 2 (Heykin; Adaptive Filt).
er Theory (Plentice Hall), various methods such as a generalized side rope canceller (GSC), a frost type beamformer, and a reference signal method are known.

【０００４】なお、適応ビームフォーマ処理と云うの
は、一般には、妨害雑音の到来方向に死角を形成したフ
ィルタにより雑音を抑圧する処理である。The adaptive beamformer process is generally a process of suppressing noise by a filter having a blind spot formed in the direction of arrival of interference noise.

【０００５】しかしながら、この適応ビームフォーマ処
理技術においては、実際の目的信号の到来方向が、仮定
した到来方向と異なる場合、その目的信号が雑音と見做
されて除去されてしまうことから、性能が劣化するとい
う問題を抱えている。However, in this adaptive beamformer processing technique, if the actual arrival direction of the target signal is different from the assumed arrival direction, the target signal is regarded as noise and is removed, so that the performance is reduced. It has a problem of deterioration.

【０００６】そこで、これを改善すべく、例えば文献３
（宝珠山他：“ブロッキング行列にリーク適応フィルタ
を用いたロバスト一般化サイドローブキャンセラ”、電
子情報通信学会論文誌ＡＶｏｌ．Ｊ７９−ＡＮ
ｏ．９ｐｐ１５１６−１５２４（１９９６．９））に
開示されているように、仮定した到来方向と実際の到来
方向とのずれを許容するような技術が開発されている
が、この場合、目的信号の除去は軽減されても、実際の
到来方向と仮定した到来方向とのずれにより、目的信号
が歪むおそれがある。Therefore, in order to improve this, for example, Document 3
(Hojuzan et al .: "Robust Generalized Sidelobe Canceller Using Leak Adaptive Filter for Blocking Matrix", IEICE Transactions A Vol. J79-AN
o. 9 pp1516-1524 (1996.9)), a technique has been developed to allow a deviation between an assumed arrival direction and an actual arrival direction. In this case, however, a target signal is removed. Even if is reduced, the target signal may be distorted due to the deviation between the actual arrival direction and the assumed arrival direction.

【０００７】これに対し、例えば、特願平９−９７９
４号公報において、複数のビームフォーマを用いて、話
者方向を逐次検知してその方向にビームフォーマの入力
方向を修正することで、話者の方向を追尾し、目的信号
の歪みを小さくする方法も開示されている。[0007] In contrast, for example, Japanese Application flat 9-979
In JP-A-4, by using a plurality of beam formers, the speaker direction is sequentially detected and the input direction of the beam former is corrected in that direction, thereby tracking the speaker direction and reducing the distortion of the target signal. Methods are also disclosed.

【０００８】しかしながら、特願平９−９７９４号公
報に開示されている方法は、時間領域の適応フィルタ処
理を行っているため、フィルタ係数から話者方向を推定
する際、時間領域のフィルタ係数から周波数領域への変
換が必要であり、計算量が大きくなる。However, the method disclosed in Japanese Unexamined Patent Publication Application flat 9-9794 is, because a adaptive filtering in the time domain, when estimating the speaker direction from the filter coefficients, the filter coefficients of the time domain Since the conversion to the frequency domain is necessary, the amount of calculation becomes large.

【０００９】[0009]

【発明が解決しようとする課題】音声の雑音を抑圧する
技術として、複数本のマイクロホンを用い、これらのマ
イクロホンで、話者の音声を取り込むと共に、妨害雑音
の到来方向に死角を形成したフィルタを通すことによ
り、雑音成分を抑圧する適応ビームフォーマ処理技術が
ある。As a technique for suppressing voice noise, a plurality of microphones are used, and the microphones are used to capture the voice of the speaker and to provide a filter that forms a blind spot in the arrival direction of the interfering noise. There is an adaptive beamformer processing technique that suppresses a noise component by passing the signal.

【００１０】この適応ビームフォーマ処理技術において
は、実際の目的信号の到来方向、すなわち、話者のいる
方向が、予め仮定した到来方向と異なる場合、目的信号
が雑音と見做されて除去され、音声収集性能が劣化する
という問題を抱えている。In this adaptive beamformer processing technique, when the actual arrival direction of the target signal, that is, the direction of the speaker is different from the assumed arrival direction, the target signal is regarded as noise and removed. There is a problem that the voice collection performance deteriorates.

【００１１】そこで、これを改善すべく、仮定した到来
方向と実際の到来方向とのずれを許容するような技術が
開発されているが、この場合、目的信号の除去は軽減さ
れても、実際の到来方向と仮定した到来方向とのずれに
より、目的信号が歪む心配があり、得られる音声の品質
の問題を残している。Therefore, in order to improve this, a technique has been developed which allows a deviation between the assumed arrival direction and the actual arrival direction, but in this case, even if the removal of the target signal is reduced, There is a concern that the target signal may be distorted due to the difference between the arrival direction of the received signal and the assumed arrival direction, leaving a problem of the quality of the obtained speech.

【００１２】また、複数のビームフォーマを用い、話者
方向を逐次検知してその方向にビームフォーマの入力方
向を修正することで、話者の方向を追尾し、目的信号の
歪みを小さくする方法も提案されている。しかしなが
ら、この方法は、時間領域の適応フィルタ処理を行って
いるため、フィルタ係数から話者方向を推定する際、時
間領域のフィルタ係数から周波数領域への変換が必要で
あり、計算量が大きくなるという問題があった。Further, a method of tracking the direction of the speaker and reducing the distortion of the target signal by sequentially detecting the direction of the speaker using a plurality of beam formers and correcting the input direction of the beam former in the direction. Is also proposed. However, since this method performs adaptive filtering in the time domain, when estimating the speaker direction from the filter coefficient, it is necessary to convert the filter coefficient in the time domain into the frequency domain, resulting in a large amount of calculation. There was a problem.

【００１３】故に、従来の技術はいずれも一長一短であ
り、高品位に目的信号を収集できると共に、処理時間も
短時間で済むようなビームフォーマ処理技術の開発が嘱
望されている。また、例えば、車両内や街頭あるいは駅
等に設置されている音声認識装置などを考えた場合、車
走行中などに遭遇する突発雑音や、高速で移動する対向
車などの雑音、通過する車両の音などがあり、また、テ
レビ会議などでは、部屋に設置された電話の突然の呼び
出し音、部屋に出入りする人のドア開閉音といった突発
雑音も考えられるので、これらのような継続時間がごく
短い雑音に関しても十分な雑音抑圧性能が欲しいところ
である。Therefore, all of the conventional techniques have merits and demerits, and there is a strong demand for the development of a beam former processing technique capable of collecting a target signal with high quality and requiring a short processing time. In addition, for example, when considering a voice recognition device installed in a vehicle, on the street, or at a station, sudden noise encountered while the vehicle is traveling, noise from an oncoming vehicle moving at high speed, or noise of a passing vehicle. There are sounds, and in a video conference, etc., a sudden ringing sound of the telephone installed in the room, a sudden noise such as the sound of the door opening and closing of people entering and leaving the room, etc. are also considered, so the duration such as these is very short Regarding noise, we would like to have sufficient noise suppression performance.

【００１４】そこで、この発明の目的とするところは、
周波数領域で動作するビームフォーマを用いることで、
計算量を大幅に削減することができると共に、継続時間
がごく短い雑音に対する雑音の抑圧性能や移動する音源
についての抑圧性能も十分期待できて突発性の雑音にも
対処できるようにした雑音成分抑圧処理装置および雑音
成分抑圧処理方法を提供することにある。Therefore, the object of the present invention is to
By using a beamformer that operates in the frequency domain,
The amount of calculation can be greatly reduced, and noise suppression for noise with a very short duration and noise suppression for moving sound sources can be expected, and noise components can be suppressed even for sudden noise. A processing device and a noise component suppression processing method are provided.

【００１５】[0015]

【課題を解決するための手段】上記目的を達成するた
め、本発明は次のように構成する。In order to achieve the above object, the present invention is configured as follows.

【００１６】［１］第１には、話者の音声を異なる２
箇所以上の位置で受音してそれぞれ音声信号として出力
する音声入力手段と、前記受音位置に対応する音声信号
のチャネル毎に周波数分析を行ってそれぞれチャネル別
の周波数成分を出力する周波数分析手段と、前記周波数
分析手段の出力する複数チャネルの周波数成分を用いて
適応フィルタ処理により目的の音声以外の到来雑音の抑
圧処理を行い、目的音声成分の信号を出力する第１のビ
ームフォーマ処理手段と、前記周波数分析手段の出力す
る複数チャネルの周波数成分を用いて適応フィルタ処理
により目的の音声の抑圧処理を行って雑音成分の信号を
出力する第２のビームフォーマ処理手段と、前記第１の
ビームフォーマ処理手段で計算されるフィルタ係数から
雑音方向を推定する雑音方向推定手段と、前記第２のビ
ームフォーマ処理手段で計算されるフィルタ係数から目
的音方向を推定する目的音方向推定手段と、前記第１の
ビームフォーマ処理手段において入力対象とする目的音
の到来方向である第１の入力方向を、前記目的音方向推
定手段で推定された目的音方向に基づいて逐次修正する
第１の入力方向修正手段と、前記第２のビームフォーマ
処理手段において入力対象とする雑音の到来方向である
第２の入力方向を、前記雑音方向推定手段で推定された
雑音方向に基づいて逐次修正する第２の入力方向修正手
段と、前記第１のビームフォーマ処理手段の出力と第２
のビームフォーマ処理手段の出力に基づいて非線形の雑
音抑圧処理であるスペクトルサブトラクション処理を行
うスペクトルサブトラクション手段と、前記周波数分析
手段から出力された周波数成分から到来音の時間差と振
幅の差に基づいた方向性の指標を計算する方向性検出手
段と、該方向性指標と前記目的音方向推定手段から出力
された目的音方向とに基づいて前記スペクトルサブトラ
クション手段のスペクトルサブトラクション処理を制御
するスペクトルサブトラクション制御手段とを具備して
構成する。[1] First, two different voices of the speaker are used.
A voice input means for receiving sound at a position of more than one position and outputting as a sound signal respectively, and a frequency analysis means for performing frequency analysis for each channel of the sound signal corresponding to the sound receiving position and outputting a frequency component for each channel. And a first beamformer processing means for suppressing incoming noise other than the target voice by adaptive filter processing using frequency components of a plurality of channels output from the frequency analysis means, and outputting a signal of the target voice component. Second beamformer processing means for outputting a noise component signal by performing target voice suppression processing by adaptive filter processing using frequency components of a plurality of channels output from the frequency analysis means, and the first beam Noise direction estimating means for estimating a noise direction from the filter coefficient calculated by the former processing means, and the second beam former processing The target sound direction estimating means for estimating the target sound direction from the filter coefficient calculated in the stage, and the first input direction which is the arrival direction of the target sound to be input in the first beamformer processing means, First input direction correcting means for sequentially correcting based on the target sound direction estimated by the sound direction estimating means, and second input direction which is the arrival direction of noise to be input in the second beamformer processing means. Second input direction correcting means for sequentially correcting the above based on the noise direction estimated by the noise direction estimating means, the output of the first beamformer processing means and the second
Spectrum subtraction means for performing non-linear noise suppression processing based on the output of the beam former processing means, and a direction based on the time difference and the amplitude difference of the incoming sound from the frequency components output from the frequency analysis means. Directionality detecting means for calculating a soundness index, and spectrum subtraction control means for controlling the spectrum subtraction processing of the spectrum subtraction means based on the directionality index and the target sound direction output from the target sound direction estimating means. It comprises and is comprised.

【００１７】［２］第２には、話者の音声を異なる２
箇所以上の位置で受音してそれぞれ音声信号として出力
する音声入力手段と、前記受音位置に対応する音声信号
のチャネル毎に周波数分析を行ってそれぞれチャネル別
の周波数成分を出力する周波数分析手段と、この周波数
分析手段にて得られる前記複数チャネルの周波数成分に
ついて、所望方向外の感度が低くなるように計算したフ
ィルタ係数を用いての適応フィルタ処理を施すことによ
り前記話者方向からの音声以外の音声を抑圧する到来雑
音抑圧処理を行い、目的音声成分を得る第１のビームフ
ォーマ処理手段と、前記周波数分析手段にて得られる前
記複数チャネルの周波数成分について、所望方向外の感
度が低くなるように計算したフィルタ係数を用いての適
応フィルタ処理を施すことにより前記話者方向からの音
声を抑圧し、第１の雑音成分を得る第２のビームフォー
マ処理手段と、前記周波数分析手段にて得られる前記複
数チャネルの周波数成分について、所望方向外の感度が
低くなるように計算したフィルタ係数を用いての適応フ
ィルタ処理を施すことにより前記話者方向からの音声を
抑圧し、第２の雑音成分を得る第２のビームフォーマ処
理手段と、前記第１のビームフォーマ処理手段で計算さ
れるフィルタ係数から雑音方向を推定する雑音方向推定
手段と、前記第２のビームフォーマ処理手段で計算され
るフィルタ係数から第１の目的音方向を推定する第１の
目的音方向推定手段と、前記第３の適応ビームフォーマ
処理手段で計算されるフィルタ係数から第２の目的音方
向を推定する第２の目的音方向推定手段と、前記第１の
ビームフォーマ処理手段において入力対象とする目的音
の到来方向である第１の入力方向を、前記第１の目的音
方向推定手段で推定された第１の目的音方向と、第２の
目的音方向推定手段で推定された第２の目的音方向のい
ずれか一方または両方に基づいて逐次修正する第１の入
力方向修正手段と、前記雑音方向修正手段で推定された
雑音方向が所定の第１の範囲にある場合に、前記第２の
ビームフォーマ処理手段において入力対象とする雑音の
到来方向である第２の入力方向を該雑音方向に基づいて
逐次修正する第２の入力方向修正手段と、前記雑音方向
修正手段で推定された雑音方向が所定の第２の範囲にあ
る場合に、前記第３のビームフォーマ処理手段において
入力対象とする雑音の到来方向である第３の入力方向を
該雑音方向に基づいて逐次修正する第３の入力方向修正
手段と、前記雑音方向推定手段で推定された雑音方向が
所定の第１の範囲から到来したか所定の第２の範囲から
到来したかに基づいて前記第１および第２の出力雑音の
いずれか一方を真の雑音出力と決定していずれか一方の
雑音を出力すると同時に、第１の音声方向推定手段と第
２の音声方向推定手段のいずれの推定結果が有効である
かを決定していずれか一方の音声方向推定結果を第１の
入力方向修正手段へ出力する有効雑音決定手段と、前記
第１のビームフォーマ処理手段の出力と第２のビームフ
ォーマ処理手段の出力に基づいて非線形の雑音抑圧処理
であるスペクトルサブトラクション処理を行うスペクト
ルサブトラクション手段と、前記周波数分析手段から出
力された周波数成分から到来音の時間差と振幅の差に基
づいた方向性の指標を計算する方向性検出手段と、該方
向性指標と前記目的音方向推定手段から出力された目的
音方向とに基づいて前記スペクトルサブトラクション手
段のスペクトルサブトラクション処理を制御するスペク
トルサブトラクション制御手段とを具備して構成する。[2] Secondly, two different voices of the speaker are used.
A voice input means for receiving sound at a position of more than one position and outputting as a sound signal respectively, and a frequency analysis means for performing frequency analysis for each channel of the sound signal corresponding to the sound receiving position and outputting a frequency component for each channel. And the frequency component of the plurality of channels obtained by the frequency analysis means is subjected to adaptive filter processing using a filter coefficient calculated so that sensitivity outside the desired direction becomes low The first beamformer processing means for performing the incoming noise suppression processing for suppressing speech other than the above to obtain the target speech component, and the frequency components of the plurality of channels obtained by the frequency analysis means have low sensitivity outside the desired direction. The voice from the speaker direction is suppressed by performing adaptive filter processing using the filter coefficient calculated as follows. Second beamformer processing means for obtaining a noise component, and adaptive filter processing using the filter coefficient calculated so that the sensitivity outside the desired direction becomes low for the frequency components of the plurality of channels obtained by the frequency analysis means. By estimating the noise direction from the second beamformer processing means for suppressing the voice from the speaker direction and obtaining the second noise component, and the filter coefficient calculated by the first beamformer processing means. Noise direction estimating means, first target sound direction estimating means for estimating a first target sound direction from the filter coefficient calculated by the second beamformer processing means, and third adaptive beamformer processing means. The second target sound direction estimating means for estimating the second target sound direction from the filter coefficient calculated by The first input direction, which is the arrival direction of the target sound of interest, is estimated by the first target sound direction estimated by the first target sound direction estimation means and by the second target sound direction estimation means. When a first input direction correcting means for sequentially correcting based on either or both of the second target sound directions and a noise direction estimated by the noise direction correcting means are within a predetermined first range, A second input direction correction means for sequentially correcting the second input direction, which is the arrival direction of the noise to be input in the second beamformer processing means, based on the noise direction, and the noise direction correction means for estimation. When the generated noise direction is within the predetermined second range, the third input direction, which is the arrival direction of noise to be input in the third beamformer processing means, is sequentially corrected based on the noise direction. Third input direction correction Means and one of the first and second output noises based on whether the noise direction estimated by the noise direction estimating means comes from a predetermined first range or a predetermined second range. One of them is determined to be a true noise output and either one of the noises is output, and at the same time, which of the estimation results of the first voice direction estimation means and the second voice direction estimation means is valid is determined. Effective noise determining means for outputting one of the speech direction estimation results to the first input direction correcting means, and nonlinear noise based on the outputs of the first beamformer processing means and the second beamformer processing means. Spectral subtraction means for performing spectral subtraction processing, which is suppression processing, and a directional finger based on the time difference and the amplitude difference of the incoming sound from the frequency components output from the frequency analysis means. And a spectral subtraction control means for controlling the spectral subtraction processing of the spectral subtraction means based on the directivity index and the target sound direction output from the target sound direction estimating means. Configure.

【００１８】［３］また、第３には、前記スペクトル
サブトラクション手段は、得られた音声周波数を、周波
数帯域毎に分割して帯域毎の音声パワーを計算する音声
帯域パワー計算手段と、前記得られた雑音周波数成分
を、周波数帯域毎に分割して帯域毎の雑音パワーを計算
する雑音帯域パワー計算手段と、前記音声帯域パワー計
算手段と雑音帯域パワー計算手段とから得られる音声と
雑音の周波数帯域パワーおよび前記スペクトルサブトラ
クション制御手段の出力に基き、帯域重み係数を求める
帯域重み計算手段と、音声信号をその周波数帯域毎に前
記帯域重み係数をかけて背景雑音を抑圧するスペクトル
減算手段と、から構成することを特徴とする雑音成分抑
圧装置。[3] Thirdly, the spectrum subtraction means divides the obtained voice frequency into frequency bands to calculate voice power for each band, and the voice band power calculation means. The obtained noise frequency component is divided for each frequency band to calculate noise power for each band, and the frequency of voice and noise obtained from the voice band power calculation unit and the noise band power calculation unit. Based on the band power and the output of the spectrum subtraction control means, a band weight calculation means for obtaining a band weight coefficient, and a spectrum subtraction means for suppressing background noise by multiplying the voice signal by the band weight coefficient for each frequency band. A noise component suppressor characterized by being configured.

【００１９】［４］また、第４には、前記スペクトル
サブトラクション手段は、前記得られた音声周波数を、
周波数帯域毎に分割して帯域毎の音声パワーを計算する
音声帯域パワー計算手段と、前記得られた雑音周波数成
分を、周波数帯域毎に分割して帯域毎の雑音パワーを計
算する雑音帯域パワー計算手段と、前記音声入力手段か
ら得られた入力信号を周波数分析した入力信号の周波数
成分を周波数帯域毎に分割し、帯域毎の入力パワーを計
算する入力帯域パワー計算手段と、前記音声帯域パワー
計算手段と雑音帯域パワー計算手段とから得られる音声
と雑音の周波数帯域パワーおよび前記入力帯域パワー計
算手段にて得られる帯域毎の入力パワー並びに前記スペ
クトルサブトラクション制御手段の出力に基き、帯域重
み係数を求める帯域重み計算手段と、音声信号の周波数
帯域毎に前記帯域重み係数をかけて背景雑音を抑圧する
修正スペクトル減算手段を具備することを特徴とする。[4] Fourthly, the spectral subtraction means converts the obtained audio frequency into
Voice band power calculating means for calculating voice power for each band by dividing for each frequency band, and noise band power calculation for dividing the obtained noise frequency component for each frequency band to calculate noise power for each band Means, input band power calculation means for dividing the frequency components of the input signal obtained by frequency-analyzing the input signal obtained from the voice input means into frequency bands, and calculating the input power for each band, and the voice band power calculation Means and the noise band power calculation means, the frequency band power of voice and noise, the input power for each band obtained by the input band power calculation means, and the output of the spectrum subtraction control means, to obtain a band weighting coefficient. A band weight calculation means and a modified spectrum reduction for suppressing background noise by applying the band weight coefficient for each frequency band of the voice signal. Characterized in that it comprises means.

【００２０】このような構成の本発明は、［１］の構成
の場合、話者の発声した音声を異なる２箇所以上の位置
で音声入力手段は受音し、周波数分析手段では、これを
前記受音位置に対応する音声信号のチャネル毎に周波数
分析して複数チャネルの周波数成分を出力する。そし
て、第１のビームフォーマ処理手段はこの周波数分析手
段にて得られる前記複数チャネルの周波数成分につい
て、所望方向外の感度が低くなるように計算したフィル
タ係数を用いての適応フィルタ処理を施すことにより前
記話者方向からの音声以外の音声を抑圧する到来雑音抑
圧処理を行い、目的音声成分を得、また、第２のビーム
フォーマ処理手段は、前記周波数分析手段にて得られる
前記複数チャネルの周波数成分について、所望方向外の
感度が低くなるように計算したフィルタ係数を用いての
適応フィルタ処理を施すことにより前記話者方向からの
音声を抑圧し、雑音成分を得る。そして、雑音方向推定
手段は、前記第１のビームフォーマ処理手段で計算され
るフィルタ係数から雑音方向を推定し、目的音方向推定
手段は、前記第２のビームフォーマ処理手段で計算され
るフィルタ係数から目的音方向を推定する。目的音方向
修正手段は、前記第１のビームフォーマにおいて入力対
象となる目的音の到来方向である第１の入力方向を、前
記目的音方向推定手段で推定された目的音方向に基づい
て逐次修正するので、第１のビームフォーマは第１の入
力方向以外から到来する雑音成分を抑圧して話者の音声
成分を低雑音で抽出することになる。また、雑音方向修
正手段は、前記第２のビームフォーマにおいて入力対象
とする雑音の到来方向である第２の入力方向を、前記雑
音方向推定手段で推定された雑音方向に基づいて逐次修
正するので、第２のビームフォーマは第２の入力方向以
外から到来する成分を抑圧して話者の音声成分を抑圧し
た残りの雑音成分を抽出することになる。According to the present invention having such a configuration, in the case of the configuration [1], the voice input means receives the voice uttered by the speaker at two or more different positions, and the frequency analysis means receives the voice. Frequency analysis is performed for each channel of the audio signal corresponding to the sound receiving position, and frequency components of a plurality of channels are output. Then, the first beamformer processing means performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. Incoming noise suppression processing for suppressing voices other than the voice from the speaker direction is performed to obtain a target voice component, and the second beamformer processing means is for the plurality of channels of the plurality of channels obtained by the frequency analysis means. The frequency component is subjected to adaptive filter processing using a filter coefficient calculated so that sensitivity outside the desired direction becomes low, whereby the voice from the speaker direction is suppressed and a noise component is obtained. The noise direction estimating means estimates the noise direction from the filter coefficient calculated by the first beamformer processing means, and the target sound direction estimating means calculates the noise direction by the filter coefficient calculated by the second beamformer processing means. The target sound direction is estimated from. The target sound direction correcting means sequentially corrects the first input direction, which is the arrival direction of the target sound to be input in the first beam former, based on the target sound direction estimated by the target sound direction estimating means. Therefore, the first beam former suppresses noise components coming from other than the first input direction, and extracts the speaker's voice component with low noise. Further, the noise direction correction means sequentially corrects the second input direction, which is the arrival direction of noise to be input in the second beamformer, based on the noise direction estimated by the noise direction estimation means. , The second beam former suppresses the components coming from other than the second input direction and extracts the remaining noise components in which the speaker's voice component is suppressed.

【００２１】このように本システムは雑音成分を抑圧し
た音声周波数成分と、音声成分を抑圧した雑音周波数成
分とを別々に得ることができるが、この発明の最大の特
徴は、第１及び第２のビームフォーマとして、周波数領
域で動作するビームフォーマを用いるようにした点、そ
して、本発明では、突発性の雑音にも対処できるよう
に、短時間データを用いて到来音が目的方向から到来し
たかどうかを決めるための方向性の指標を高精度に求め
る方向性検出手段を組み入れ、方向性指標と従来処理に
おける話者方向とからスペクトルサブトラクションを制
御して突発性雑音を抑圧するようにした点にある。As described above, the present system can separately obtain the voice frequency component in which the noise component is suppressed and the noise frequency component in which the voice component is suppressed, but the greatest feature of the present invention is the first and second aspects. The beamformer operating in the frequency domain is used as the beamformer of the above, and in the present invention, the incoming sound comes from the target direction by using short-time data so as to deal with sudden noise. Incorporating directionality detection means to obtain a directionality index for deciding whether or not to be accurate, and suppressing spectral noise by controlling the spectral subtraction based on the directionality index and the speaker direction in conventional processing. It is in.

【００２２】これによると、上述した方向性指標とビー
ムフォーマのフィルタの指向性から求めた話者方向との
両方に基づいて目的音／雑音の判定を行うことにより、
設定した話者範囲以外からの音声を除去できるととも
に、突発雑音など、継続時間の短い信号も高精度で除去
できるようになるため、実環境における雑音抑圧処理を
極めて高精度に行うことが可能となる。According to this, the target sound / noise is determined based on both the directionality index and the speaker direction obtained from the directivity of the beamformer filter,
It is possible to remove voices from areas other than the set speaker range, and it is also possible to remove signals with short durations, such as sudden noise, with high accuracy, making it possible to perform noise suppression processing in an actual environment with extremely high accuracy. Become.

【００２３】また、第１及び第２のビームフォーマとし
て、周波数領域で動作するビームフォーマを用いるよう
にしたことによって、計算量を大幅に削減することがで
きるようになる。Further, since the beam formers operating in the frequency domain are used as the first and second beam formers, the amount of calculation can be significantly reduced.

【００２４】そしてこの発明によると、適応フィルタの
処理量が大幅に低減されるのに加え、入力音声に対する
周波数分析以外の周波数分析処理を省略することがで
き、かつ、フィルタ演算時に必要であった時間領域から
周波数領域ヘの変換処理も不要となり、全体の演算量を
大幅に削減することができる。According to the present invention, in addition to the processing amount of the adaptive filter being greatly reduced, the frequency analysis processing other than the frequency analysis on the input voice can be omitted, and it is necessary for the filter calculation. The conversion processing from the time domain to the frequency domain is also unnecessary, and the overall calculation amount can be significantly reduced.

【００２５】すなわち、従来技術では、ビームフォーマ
で抑圧できない拡散性雑音の抑圧処理のために、スペク
トルサブトラクション（以後、ＳＳと略称する）処理
を、ビームフォーマ処理の後に行うようにしており、こ
のＳＳは周波数スペクトルを入力とするため、ＦＦＴ
（高速フーリエ変換）などの周波数分析が従来必要であ
ったが、周波数領域で動作するビームフォーマを用いる
と当該ビームフォーマからは周波数スペクトルが出力さ
れるため、これをＳＳに流用できるので、特別にＳＳの
ためのＦＦＴを実施する従来のＦＦＴ処理工程は省略す
ることができる。故に、全体の演算量を大幅に削減する
ことができる。That is, in the prior art, a spectral subtraction (hereinafter abbreviated as SS) process is performed after the beamformer process in order to suppress the spreading noise that cannot be suppressed by the beamformer. Takes the frequency spectrum as input, so FFT
Frequency analysis such as (Fast Fourier Transform) has been conventionally required, but when a beamformer operating in the frequency domain is used, the frequency spectrum is output from the beamformer, and this can be diverted to SS. Conventional FFT processing steps that perform FFT for SS can be omitted. Therefore, the total calculation amount can be significantly reduced.

【００２６】また、ビームフォーマのフィルタを用いた
方向推定の際に必要であった時間領域から周波数領域へ
の変換処理も不要となり、全体の演算量を大幅に削減す
ることができる。Further, the conversion processing from the time domain to the frequency domain, which is necessary for the direction estimation using the filter of the beamformer, becomes unnecessary, and the total calculation amount can be greatly reduced.

【００２７】また、［２］の構成の場合、話者の発声し
た音声を異なる２箇所以上の位置で音声入力手段は受音
し、周波数分析手段では、これを前記受音位置に対応す
る音声信号のチャネル毎に周波数分析して複数チャネル
の周波数成分を出力する。そして、第１のビームフォー
マ処理手段はこの周波数分析手段にて得られる前記複数
チャネルの周波数成分について、所望方向外の感度が低
くなるように計算したフィルタ係数を用いての適応フィ
ルタ処理を施すことにより前記話者方向からの音声以外
の音声を抑圧する到来雑音抑圧処理を行い、目的音声成
分を得、また、第２のビームフォーマ処理手段は、前記
周波数分析手段にて得られる前記複数チャネルの周波数
成分について、所望方向外の感度が低くなるように計算
したフィルタ係数を用いての適応フィルタ処理を施すこ
とにより前記話者方向からの音声を抑圧し、雑音成分を
得る。そして、雑音方向推定手段は、前記第１のビーム
フォーマ処理手段で計算されるフィルタ係数から雑音方
向を推定し、目的音方向推定手段は、前記第２のビーム
フォーマ処理手段で計算されるフィルタ係数から目的音
方向を推定する。Further, in the case of the configuration [2], the voice input means receives the voice uttered by the speaker at two or more different positions, and the frequency analysis means receives the voice corresponding to the sound receiving position. Frequency analysis is performed for each channel of the signal, and frequency components of a plurality of channels are output. Then, the first beamformer processing means performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. Incoming noise suppression processing for suppressing voices other than the voice from the speaker direction is performed to obtain a target voice component, and the second beamformer processing means is for the plurality of channels of the plurality of channels obtained by the frequency analysis means. The frequency component is subjected to adaptive filter processing using a filter coefficient calculated so that sensitivity outside the desired direction becomes low, whereby the voice from the speaker direction is suppressed and a noise component is obtained. The noise direction estimating means estimates the noise direction from the filter coefficient calculated by the first beamformer processing means, and the target sound direction estimating means calculates the noise direction by the filter coefficient calculated by the second beamformer processing means. The target sound direction is estimated from.

【００２８】また、第１の目的音方向推定手段は前記第
２のビームフォーマ処理手段で計算されるフィルタ係数
から第１の目的音方向を推定し、第２の目的音方向推定
手段は、前記第３の適応ビームフォーマ処理手段で計算
されるフィルタ係数から第２の目的音方向を推定する。The first target sound direction estimating means estimates the first target sound direction from the filter coefficient calculated by the second beamformer processing means, and the second target sound direction estimating means calculates the target sound direction. The second target sound direction is estimated from the filter coefficient calculated by the third adaptive beam former processing means.

【００２９】第１の入力方向修正手段は、前記第１のビ
ームフォーマにおいて入力対象とする目的音の到来方向
である第１の入力方向を、前記第１の目的音方向推定手
段で推定された第１の目的音方向と、第２の目的音方向
推定手段で推定された第２の目的音方向のいずれか一方
または両方に基づいて逐次修正する。そして、第２の入
力方向修正手段は、前記雑音方向修正手段で推定された
雑音方向が所定の第１の範囲にある場合に、前記第２の
ビームフォーマにおいて入力対象とする雑音の到来方向
である第２の入力方向を該雑音方向に基づいて逐次修正
し、第３の入力方向修正手段は、前記雑音方向修正手段
で推定された雑音方向が所定の第２の範囲にある場合
に、前記第３のビームフォーマにおいて入力対象とする
雑音の到来方向である第３の入力方向を該雑音方向に基
づいて逐次修正する。The first input direction correcting means estimates the first input direction, which is the arrival direction of the target sound to be input in the first beam former, by the first target sound direction estimating means. The first target sound direction and the second target sound direction estimated by the second target sound direction estimating means are sequentially corrected based on one or both of them. When the noise direction estimated by the noise direction correcting unit is within a predetermined first range, the second input direction correcting unit determines the arrival direction of noise to be input in the second beamformer. A certain second input direction is sequentially corrected on the basis of the noise direction, and the third input direction correction means, when the noise direction estimated by the noise direction correction means is within a predetermined second range, The third input direction, which is the arrival direction of noise to be input in the third beam former, is sequentially corrected based on the noise direction.

【００３０】従って、第２の入力方向修正手段の出力に
より第２の入力方向を修正される第２のビームフォーマ
は第２の入力方向以外から到来する成分を抑圧して残り
の雑音成分を抽出することになり、また、第３の入力方
向修正手段の出力により第３の入力方向を修正される第
３のビームフォーマは第３の入力方向以外から到来する
成分を抑圧して残りの雑音成分を抽出することになる。Therefore, the second beam former whose second input direction is corrected by the output of the second input direction correcting means suppresses components coming from other than the second input direction and extracts the remaining noise components. In addition, the third beam former whose third input direction is corrected by the output of the third input direction correcting means suppresses the components arriving from other than the third input direction and remaining noise components. Will be extracted.

【００３１】そして、有効雑音決定手段は、前記雑音方
向推定手段で推定された雑音方向が所定の第１の範囲か
ら到来したか所定の第２の範囲から到来したかに基づい
て前記第１の出力雑音と前記第２の出力雑音のいずれか
一方を真の雑音出力と決定していずれか一方の雑音を出
力すると同時に、第１の音声方向推定手段と第２の音声
方向推定手段のいずれの推定結果が有効であるかを決定
して有効な方の音声方向推定結果を第１の入力方向修正
手段へ出力する。Then, the effective noise determining means determines whether the noise direction estimated by the noise direction estimating means comes from the predetermined first range or the predetermined second range. Either one of the output noise and the second output noise is determined to be a true noise output, and either one of the noises is output, and at the same time, one of the first voice direction estimating means and the second voice direction estimating means is output. It is determined whether the estimation result is valid and the valid voice direction estimation result is output to the first input direction correcting means.

【００３２】この結果、目的音方向修正手段は、前記第
１のビームフォーマにおいて入力対象となる目的音の到
来方向である第１の入力方向を、前記決定した方の目的
音方向推定手段で得た目的音方向に基づいて逐次修正す
るので、第１のビームフォーマは第１の入力方向以外か
ら到来する雑音成分を抑圧して話者の音声成分を低雑音
で抽出することになる。As a result, the target sound direction correction means obtains the first input direction, which is the arrival direction of the target sound to be input in the first beam former, by the determined target sound direction estimation means. The first beamformer suppresses the noise components coming from other than the first input direction and extracts the speaker's voice component with low noise, because the correction is sequentially performed based on the target sound direction.

【００３３】このように本システムは雑音成分を抑圧し
た音声周波数成分と、音声成分を抑圧した雑音周波数成
分とを別々に得ることができるが、この発明の最大の特
徴は、第１及び第２のビームフォーマとして、周波数領
域で動作するビームフォーマを用いるようにした点、そ
して、本発明では、突発性の雑音にも対処できるよう
に、短時間データを用いて到来音が目的方向から到来し
たかどうかを決めるための方向性の指標を高精度に求め
る方向性検出手段を組み入れ、方向性指標と従来処理に
おける話者方向とからスペクトルサブトラクションを制
御して突発性雑音を抑圧するようにした点にある。As described above, the present system can separately obtain the voice frequency component in which the noise component is suppressed and the noise frequency component in which the voice component is suppressed, but the greatest feature of the present invention is the first and second aspects. The beamformer operating in the frequency domain is used as the beamformer of the above, and in the present invention, the incoming sound comes from the target direction by using short-time data so as to deal with sudden noise. Incorporating directionality detection means to obtain a directionality index for deciding whether or not to be accurate, and suppressing spectral noise by controlling the spectral subtraction based on the directionality index and the speaker direction in conventional processing. It is in.

【００３４】これによると、上述した方向性指標とビー
ムフォーマのフィルタの指向性から求めた話者方向との
両方に基づいて目的音／雑音の判定を行うことにより、
設定した話者範囲以外からの音声を除去できるととも
に、突発雑音など、継続時間の短い信号も高精度で除去
できるようになるため、実環境における雑音抑圧処理を
極めて高精度に行うことが可能となる。According to this, by determining the target sound / noise based on both the above-mentioned directionality index and the speaker direction obtained from the directivity of the filter of the beamformer,
It is possible to remove voices from areas other than the set speaker range, and it is also possible to remove signals with short durations, such as sudden noise, with high accuracy, making it possible to perform noise suppression processing in an actual environment with extremely high accuracy. Become.

【００３５】また、第１及び第２のビームフォーマとし
て、周波数領域で動作するビームフォーマを用いるよう
にしたことによって、計算量を大幅に削減することがで
きるようになる。Further, since the beam formers operating in the frequency domain are used as the first and second beam formers, the amount of calculation can be significantly reduced.

【００３６】そしてこの発明によると、適応フィルタの
処理量が大幅に低減されるのに加え、入力音声に対する
周波数分析以外の周波数分析処理を省略することがで
き、かつ、フィルタ演算時に必要であった時間領域から
周波数領域ヘの変換処理も不要となり、全体の演算量を
大幅に削減することができる。According to the present invention, the processing amount of the adaptive filter is greatly reduced, and frequency analysis processing other than the frequency analysis on the input voice can be omitted, and it is necessary for the filter calculation. The conversion processing from the time domain to the frequency domain is also unnecessary, and the overall calculation amount can be significantly reduced.

【００３７】また、本発明では、雑音追尾に監視領域を
全く異ならせた雑音追尾用のビームフォーマを設けてあ
り、それぞれの出力からそれぞれ音声方向を推定させる
と共に、それぞれの推定結果からいずれが有効な雑音追
尾をしているかを判断して、有効と判断された方のビー
ムフォーマのフィルタ係数による音声方向の推定結果を
第１の目的音方向修正手段に与えることで第１の目的音
方向修正手段は、前記第１のビームフォーマにおいて入
力対象となる目的音の到来方向である第１の入力方向
を、前記目的音方向推定手段で推定された目的音方向に
基づいて逐次修正するので、第１のビームフォーマは第
１の入力方向以外から到来する雑音成分を抑圧して話者
の音声成分を低雑音で抽出することができ、雑音源が移
動してもこれを見失うことなく追尾して抑圧することが
できるようになるものである。Further, according to the present invention, a noise tracking beamformer having completely different monitoring areas for noise tracking is provided to estimate the voice direction from each output and which is effective from each estimation result. It is determined whether or not the noise tracking is performed properly, and the result of estimation of the voice direction by the filter coefficient of the beamformer that is determined to be effective is given to the first target sound direction correction means to correct the first target sound direction. The means sequentially corrects the arrival direction of the target sound as the input target in the first beam former, based on the target sound direction estimated by the target sound direction estimation means. The beamformer No. 1 can suppress the noise component coming from other than the first input direction and extract the speaker's voice component with low noise, and even if the noise source moves, this is lost. In which it is possible to suppress and tracking without.

【００３８】従来技術においては、２ｃｈ、すなわち、
２本のマイクロホンだけでも目的音源の追尾を可能とす
べく、雑音追尾用のビームフォーマを雑音抑圧のビーム
フォーマとは別に１個用いるが、例えば、雑音源が目的
音の方向を横切って移動したような場合、雑音の追尾精
度が低下することがあった。In the prior art, 2ch, that is,
In order to enable tracking of the target sound source with only two microphones, one beamformer for noise tracking is used separately from the beamformer for noise suppression. For example, the noise source has moved across the direction of the target sound. In such a case, noise tracking accuracy may decrease.

【００３９】しかし、本発明では、雑音を追尾するビー
ムフォーマを複数用いて各々別個の追尾範囲を受け持つ
ようにしたことにより、上記のような場合でも追尾精度
の低下を抑止できるようになる。However, according to the present invention, since a plurality of beam formers for tracking noise are used to handle separate tracking ranges, it is possible to prevent the tracking accuracy from deteriorating even in the above case.

【００４０】また、［３］項の構成の場合、音声帯域パ
ワー計算手段は、得られた音声周波数のスペクトル成分
を、周波数帯域毎に分割して帯域毎の音声パワーを計算
し、雑音帯域パワー計算手段は、前記得られた雑音周波
数のスペクトル成分を、周波数帯域毎に分割して帯域毎
の雑音パワーを計算する。そして、帯域重み計算手段
は、前記音声帯域パワー計算手段と雑音帯域パワー計算
手段とから得られる音声と雑音の周波数帯域パワーとス
ペクトルサブトラクション制御手段の出力とに基き、帯
域毎の重み係数を求め、スペクトル減算手段は、音声信
号の周波数帯域毎にこの重み係数をかけて背景雑音を抑
圧する。Further, in the case of the configuration of the item [3], the voice band power calculation means divides the obtained spectrum component of the voice frequency into frequency bands to calculate the voice power for each band, and calculates the noise band power. The calculating means divides the obtained spectral component of the noise frequency into frequency bands to calculate noise power for each band. Then, the band weight calculating means, based on the frequency band power of the voice and noise obtained from the voice band power calculating means and the noise band power calculating means, and the output of the spectrum subtraction controlling means, obtains a weighting coefficient for each band, The spectrum subtraction unit suppresses the background noise by applying this weighting coefficient for each frequency band of the audio signal.

【００４１】この構成によれば、ビームフォーマでは抑
圧できない方向性のない雑音（背景雑音）は、本発明シ
ステムのビームフォーマで得ることのできる目的音声成
分と雑音成分を利用し、これをスペクトルサブトラクシ
ョン処理することで抑圧する。すなわち、本システムで
は、ビームフォーマとして目的音声成分抽出用と雑音成
分抽出用の２つのビームフォーマを備えているが、これ
らのビームフォーマの出力である目的音声成分と雑音成
分を利用してスペクトルサブトラクション処理すること
により、方向性のない背景雑音成分の抑圧を行う。スペ
クトルサブトラクション（ＳＳ）処理は雑音抑圧処理と
して知られるが、一般的に行われるスペクトルサブトラ
クション（ＳＳ）処理は、１チャネルのマイクロホン
（つまり、１本のマイクロホン）を用い、このマイクロ
ホンの出力から音声のない区間において雑音のパワーを
推定するため、非定常な雑音が音声に重畳している場合
には対処できない。また、２チャネルのマイクロホン
（つまり、２本のマイクロホン）を用いて、一方を雑音
収集用、片方を雑音重畳音声収集用とする場合にも、両
マイクロホンの設置場所を離す必要があり、その結果、
音声に重畳する雑音と、雑音収集用マイクロホンで取り
込む雑音との位相がずれ、スペクトルサブトラクション
処理しても雑音抑圧の改善効果は大きく上がらない。According to this configuration, the nondirectional noise that cannot be suppressed by the beamformer (background noise) uses the target voice component and noise component that can be obtained by the beamformer of the system of the present invention, and uses this for spectral subtraction. It is suppressed by processing. That is, this system is provided with two beam formers for extracting a target voice component and a noise component as beam formers. The target voice component and the noise component, which are outputs of these beam formers, are used to perform spectral subtraction. By processing, background noise components having no directionality are suppressed. Spectral subtraction (SS) processing is known as noise suppression processing, but generally performed spectral subtraction (SS) processing uses a one-channel microphone (that is, one microphone), and the output of this microphone is used to output audio. Since the noise power is estimated in a non-existing section, it cannot be dealt with when non-stationary noise is superimposed on the speech. In addition, when two-channel microphones (that is, two microphones) are used and one is used for noise collection and one is used for noise-superimposed voice collection, it is necessary to separate the installation locations of both microphones. ,
The noise superimposed on the voice and the noise captured by the noise collecting microphone are out of phase with each other, and the effect of improving noise suppression is not significantly improved even if the spectral subtraction process is performed.

【００４２】しかし、本発明では、雑音成分を取り出す
ビームフォーマを用意して、このビームフォーマの出力
を用いるようにしたため、位相のずれは補正されてお
り、従って、非定常雑音の場合でも高精度なスペクトル
サブトラクション処理を実現できる。さらに、周波数領
域のビームフォーマの出力を利用しているため、周波数
分析を省略してスペクトルサブトラクションが可能であ
り、従来より少ない演算量で非定常雑音を抑圧できる。However, in the present invention, since the beam former for extracting the noise component is prepared and the output of this beam former is used, the phase shift is corrected. Therefore, even in the case of non-stationary noise, the accuracy is high. It is possible to realize various spectrum subtraction processing. Furthermore, since the output of the beamformer in the frequency domain is used, frequency analysis can be omitted and spectral subtraction can be performed, and non-stationary noise can be suppressed with a smaller amount of calculation than before.

【００４３】更に［４］項の発明は、上記［３］の発明
の雑音抑圧装置において、音声入力手段から得られた入
力信号を周波数分析した入力信号の周波数成分を周波数
帯域毎に分割し、帯域毎の入力パワーを計算する入力信
号帯域パワー計算手段を設けてあり、音声信号の周波数
帯域毎に重みをかけて背景雑音を抑圧する処理を実施さ
せるようにしており、この構成の場合、音声帯域パワー
計算手段は、得られた音声周波数のスペクトル成分を、
周波数帯域毎に分割して帯域毎の音声パワーを計算し、
雑音帯域パワー計算手段は、前記得られた雑音周波数の
スペクトル成分を、周波数帯域毎に分割して帯域毎の雑
音パワーを計算する。Further, the invention of [4] is, in the noise suppressor of the invention of [3], frequency components of the input signal obtained by frequency-analyzing the input signal obtained from the voice input means are divided for each frequency band, An input signal band power calculation means for calculating the input power for each band is provided, and the processing for suppressing the background noise is performed by weighting each frequency band of the voice signal. Band power calculation means, the spectral component of the obtained voice frequency,
Divide each frequency band and calculate the voice power for each band,
The noise band power calculation means calculates the noise power for each band by dividing the obtained spectral component of the noise frequency for each frequency band.

【００４４】また、入力帯域パワー計算手段があり、こ
の入力帯域パワー計算手段は、音声入力手段から得られ
た入力信号を周波数分析して得た入力音声の周波数スペ
クトル成分を受けて、これを周波数帯域毎に分割し、帯
域毎の入力パワーを計算する。There is also input band power calculation means, which receives the frequency spectrum component of the input voice obtained by frequency-analyzing the input signal obtained from the voice input means, and outputs this frequency frequency component. Divide into each band and calculate the input power for each band.

【００４５】そして、帯域重み計算手段は、前記音声帯
域パワー計算手段と雑音帯域パワー計算手段と入力信号
帯域パワー計算手段とから得られる音声入力の周波数帯
域パワーと雑音入力の周波数帯域パワー、そして、音声
と雑音の混入した入力の周波数帯域パワーおよびスペク
トルサブトラクション制御手段の出力とに基き、帯域毎
の重み係数を求め、スペクトル減算手段は、音声信号の
周波数帯域毎にこの重み係数をかけて背景雑音を抑圧す
る。Then, the band weight calculating means is the voice input frequency band power and the noise input frequency band power obtained from the voice band power calculating means, the noise band power calculating means and the input signal band power calculating means, and Based on the input frequency band power mixed with voice and noise and the output of the spectral subtraction control means, the weighting coefficient for each band is obtained, and the spectrum subtracting means applies the weighting coefficient for each frequency band of the voice signal to the background noise. Suppress.

【００４６】この［４］項の発明においては、［３］項
の発明におけるスペクトルサブトラクション（ＳＳ）処
理において、更に雑音成分についてそのパワーを修正す
るようにしたことにより、一層高精度に雑音抑圧を行う
ことを可能とするものである。すなわち、［３］項の発
明では雑音源のパワ−Ｎが小さいという仮定をおいたた
め、スペクトルサブトラクション（ＳＳ）処理を行うと
雑音源の成分が音声に重畳している部分では歪みが大き
くなる可能性が残るが、ここでは入力信号のパワーを用
いて第３の発明でのスペクトルサブトラクション処理に
おける帯域重みの計算を修正するようにした。In the invention of the above item [4], in the spectral subtraction (SS) processing of the invention of the item [3], the power of the noise component is further corrected, so that the noise suppression can be performed with higher accuracy. It makes it possible to do. That is, in the invention of the item [3], it is assumed that the power N of the noise source is small. Therefore, when the spectral subtraction (SS) process is performed, the distortion may be large in the portion where the noise source component is superimposed on the voice. However, here, the power of the input signal is used to correct the calculation of the band weight in the spectral subtraction processing in the third invention.

【００４７】これにより、方向を持つ雑音成分および方
向のない雑音成分を抑圧した歪みの少い音声成分のみの
抽出ができるようになる。As a result, it becomes possible to extract only the voice component with a small distortion by suppressing the noise component having a direction and the noise component having no direction.

【００４８】[0048]

【発明の実施の形態】以下、本発明の実施の形態につき
図面を参照して説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００４９】（基本構成例）図１は、本発明による雑音
成分抑圧装置の基本構成図である。このシステムの特徴
は、音声入力部として指向性のある少なくとも２チャネ
ル分のマイクロホンを互いに指向性の軸方向を傾けて配
置し、これらのマイクロホンで得た音声信号をそれぞれ
チャネル別に周波数分析し、これを所望方向外の感度が
低くなるように計算したフィルタ係数を用いての適応ビ
ームフォーマ処理することで、話者方向からの音声を抑
圧して雑音成分を得、この雑音成分を抑圧する処理を施
して雑音の少ない話者音声成分を得ると云った雑音抑圧
処理装置を用いることにより、雑音抑制した話者音声成
分と雑音成分とを得るようにすると共に、この雑音抑圧
処理装置に、短時間周波数分析に基づく方向性検出部を
追加し、ビームフォーマ処理で抑圧できない突発性雑音
や高速移動音源等の到来をこの方向性検出部で検出し、
この検出結果と雑音抑圧処理装置にて求めた話者音声成
分と雑音成分とを用いて行うスペクトルサブトラクショ
ンを制御することにより、話者方向の許容範囲を高精度
に設定できる話者追尾機能を確保しつつ、しかも、突発
性雑音、高速移動音源等を抑圧することを可能としてい
る。(Basic Configuration Example) FIG. 1 is a basic configuration diagram of a noise component suppressing apparatus according to the present invention. The feature of this system is that microphones for at least two channels having directivity are arranged as voice input sections with the axial directions of the directivities inclined to each other, and the voice signals obtained by these microphones are subjected to frequency analysis for each channel. By performing adaptive beamformer processing using a filter coefficient calculated so that sensitivity outside the desired direction becomes low, noise from the speaker direction is suppressed to obtain a noise component, and processing for suppressing this noise component is performed. By using a noise suppression processing device that is said to obtain a speaker voice component with less noise, a noise-suppressed speaker voice component and a noise component are obtained, and this noise suppression processing device is used for a short time. A direction detection unit based on frequency analysis is added to detect the arrival of sudden noise or high-speed moving sound source that cannot be suppressed by the beamformer processing.
By controlling the spectral subtraction performed using this detection result and the speaker voice component and noise component obtained by the noise suppression processing device, a speaker tracking function that can set the allowable range in the speaker direction with high accuracy is secured. In addition, it is possible to suppress sudden noise, high-speed moving sound source, and the like.

【００５０】すなわち、図１において、１０は音声入力
部と周波数分析部および雑音抑制処理装置とを備える話
者追尾マイクロホンアレイであり、３０はスペクトルサ
ブトラクション処理部、４０は方向検出部、５０はスペ
クトルサブトラクション制御部である。That is, in FIG. 1, 10 is a speaker tracking microphone array including a voice input section, a frequency analysis section, and a noise suppression processing apparatus, 30 is a spectrum subtraction processing section, 40 is a direction detection section, and 50 is a spectrum. It is a subtraction control unit.

【００５１】話者追尾マイクロホンアレイ１０の構成要
素である音声入力部は話者者の発声した音声を少なくと
も異なる２箇所以上の位置で受音して音声信号を得るた
めのものであって、少なくとも指向性のある２個のマイ
クロホンを軸を互いに傾けて配置することにより、集音
方向がずれたものとなるようにしたものである。また、
話者追尾マイクロホンアレイ１０の構成要素である周波
数分析部は、前記受音位置に対応する音声信号のチャネ
ル毎に周波数分析を行って複数チャネルの周波数成分を
出力するものである。The voice input unit, which is a component of the speaker tracking microphone array 10, is for receiving the voice uttered by the speaker at at least two different positions to obtain a voice signal. By disposing two microphones having directivity with their axes inclined to each other, the sound collection directions are shifted. Also,
The frequency analysis unit, which is a constituent element of the speaker tracking microphone array 10, performs frequency analysis for each channel of the audio signal corresponding to the sound receiving position and outputs frequency components of a plurality of channels.

【００５２】また、話者追尾マイクロホンアレイ１０の
構成要素である雑音抑制処理装置は、例えば、周波数分
析部にて得られる前記複数チャネルの周波数成分につい
て、所望方向外の感度が低くなるように計算したフィル
タ係数を用いての適応フィルタ処理を施すことにより前
記話者方向からの音声以外の音声を抑圧する到来雑音抑
圧処理を行い、目的音声成分を得る第１のビームフォー
マ処理手段と、前記周波数分析部にて得られる前記複数
チャネルの周波数成分について、所望方向外の感度が低
くなるように計算したフィルタ係数を用いての適応フィ
ルタ処理を施すことにより前記話者方向からの音声を抑
圧し、雑音成分を得る第２のビームフォーマ処理手段
と、前記第１のビームフォーマ処理手段で計算されるフ
ィルタ係数から雑音方向を推定する雑音方向推定手段
と、前記第２のビームフォーマ処理手段で計算されるフ
ィルタ係数から目的音方向を推定する目的音方向推定手
段と、前記第１のビームフォーマにおいて入力対象とな
る目的音の到来方向である第１の入力方向を、前記目的
音方向推定手段で推定された目的音方向に基づいて逐次
修正する目的音方向修正手段と、前記第２のビームフォ
ーマにおいて入力対象とする雑音の到来方向である第２
の入力方向を、前記雑音方向推定手段で推定された雑音
方向に基づいて逐次修正する雑音方向修正手段とを具備
すると云った如きの構成である。In addition, the noise suppression processing device, which is a component of the speaker tracking microphone array 10, calculates, for example, the frequency components of the plurality of channels obtained by the frequency analysis unit so that the sensitivity outside the desired direction becomes low. First beamformer processing means for obtaining a target voice component by performing incoming noise suppression processing for suppressing voices other than the voice from the speaker direction by performing adaptive filter processing using the filtered filter coefficient; With respect to the frequency components of the plurality of channels obtained by the analysis unit, the voice from the speaker direction is suppressed by performing adaptive filter processing using a filter coefficient calculated so that the sensitivity outside the desired direction becomes low, Noise is obtained from the second beamformer processing means for obtaining a noise component and the filter coefficient calculated by the first beamformer processing means. Noise direction estimating means for estimating the direction, target sound direction estimating means for estimating the target sound direction from the filter coefficient calculated by the second beamformer processing means, and an object to be an input target in the first beamformer A first sound input direction, which is a sound arrival direction, is sequentially corrected based on the target sound direction estimated by the target sound direction estimating means, and an input target is input in the second beam former. The second direction of noise arrival
And a noise direction correcting means for sequentially correcting the input direction of 1 based on the noise direction estimated by the noise direction estimating means.

【００５３】方向検出部４０は、前記周波数分析部から
出力された周波数成分から到来音の時間差と振幅の差に
基づいた方向性の指標Ｄを計算するものであり、スペク
トルサブトラクション処理部３０は、前記第１のビーム
フォーマ出力と第２のビームフォーマ出力に基づいて非
線形の雑音抑圧処理であるスペクトルサブトラクション
処理を行うためのものである。The direction detecting section 40 calculates the index D of directionality based on the time difference and the amplitude difference of the incoming sound from the frequency component output from the frequency analyzing section, and the spectral subtraction processing section 30 This is for performing a spectral subtraction process which is a nonlinear noise suppression process based on the first beam former output and the second beam former output.

【００５４】また、スペクトルサブトラクション制御部
５０は方向検出部４０にて得られた方向性指標Ｄと前記
雑音抑制処理装置の目的音方向推定手段から出力された
目的音方向とに基づいて前記スペクトルサブトラクショ
ン手段の処理を制御するものである。Further, the spectral subtraction control unit 50, based on the directionality index D obtained by the direction detection unit 40 and the target sound direction output from the target sound direction estimating means of the noise suppression processing device, the spectral subtraction. It controls the processing of the means.

【００５５】本雑音成分抑圧処理全体の流れを図２に示
す。FIG. 2 shows the flow of the entire noise component suppressing process.

【００５６】まず、基本的な雑音抑圧処理について説明
する。ここでは２チャネル、すなわち、マイクロホン２
本で捉えた２系統の音声信号を用いる例で説明するが、
３チャネル以上となった場合でも処理の方法は同様であ
る。First, the basic noise suppression processing will be described. 2 channels here, ie microphone 2
An example using two audio signals captured in the book will be explained.
The processing method is the same even when there are three or more channels.

【００５７】２つのマイクロホンを持つ話者追尾マイク
ロホンアレイ１０の音声入力部から入力された音声は、
周波数分析部に送られ、例えば高速フーリエ変換（ＦＦ
Ｔ）等により周波数成分が計算される。次に、話者追尾
マイクロホンアレイ１０の雑音抑圧処理装置における構
成要素の一つである第１のビームフォーマでは、２チャ
ネルの入力に対する周波数成分から、周波数領域の適応
フィルタにより雑音を抑圧し、目的音の方向の周波数成
分を出力する。ここでは、目的音の方向をマイクロホン
の正面とするように、目的音方向推定部からの出力を用
いて入力方向修正部１で位相を整える操作を行う。The voice input from the voice input section of the speaker tracking microphone array 10 having two microphones is
It is sent to the frequency analysis unit, for example, fast Fourier transform (FF
The frequency component is calculated by T) or the like. Next, in the first beamformer, which is one of the constituent elements in the noise suppression processing device of the speaker tracking microphone array 10, the noise is suppressed by the frequency domain adaptive filter from the frequency components for the inputs of the two channels, and the purpose is Outputs the frequency component in the direction of sound. Here, the operation of adjusting the phase by the input direction correction unit 1 is performed using the output from the target sound direction estimation unit so that the direction of the target sound is in front of the microphone.

【００５８】また、話者追尾マイクロホンアレイ１０の
雑音抑圧処理装置における構成要素の一つである第２の
ビームフォーマ１６では、２チャネルの入力に対する周
波数成分から、周波数領域の適応フィルタにより目的音
を抑圧し、雑音の方向の周波数成分を出力する。ここで
は、雑音の方向をマイクロホンの正面と仮定し、２つの
マイクロホンに対して雑音が同時に到着したと見なせる
ように、雑音方向推定部からの出力を用いて第２の入力
方向修正部で位相を整える操作（整相）を行う。Further, in the second beam former 16 which is one of the constituent elements in the noise suppression processing device of the speaker tracking microphone array 10, the target sound is obtained from the frequency components corresponding to the inputs of the two channels by the adaptive filter in the frequency domain. Suppress and output the frequency component in the noise direction. Here, it is assumed that the direction of noise is in front of the microphones, and the output from the noise direction estimation unit is used to determine the phase in the second input direction correction unit so that it can be considered that the noises have arrived at the two microphones at the same time. Perform the adjustment operation (phase adjustment).

【００５９】ここで、話者追尾マイクロホンアレイ１０
の雑音抑圧処理装置における構成要素の一つである雑音
方向推定部では、第１のビームフォーマの適応フィルタ
から雑音方向を推定し、目的音方向推定部、第２のビー
ムフォーマの適応フィルタから目的音方向を推定する。
なお、これらの処理は例えば８［ｍｓｅｃ］等の固定時
間毎に行われる。Here, the speaker tracking microphone array 10
In the noise direction estimation unit, which is one of the constituent elements of the noise suppression processing device, the noise direction is estimated from the adaptive filter of the first beam former, and the target sound direction estimation unit and the adaptive filter of the second beam former are used. Estimate the sound direction.
Note that these processes are performed at fixed time intervals such as 8 [msec].

【００６０】次に、本発明の重要なポイントである方向
検出部４０とスペクトルサブトラクション制御部５０に
ついて説明する。方向検出部４０では、その詳細は後述
するが、短時間ＦＦＴなどの周波数分析に基づき、２つ
のマイクロホンの位相差のみならず、各チャネルの入力
信号のパワー比を用いて、方向性指標を計算する。スペ
クトルサブトラクション制御部５０は、話者追尾マイク
ロホンアレイ１０の雑音抑圧処理装置における構成要素
の一つである目的音方向推定部で推定された目的音方向
（話者方向）と、方向性検出部４０で計算された方向性
指標Ｄに基づいて、図２に示すように、方向性指標Ｄの
値に応じた３通りの信号のうちのいずれかをスペクトル
サブトラクション処理部３０に送り、スペクトルサブト
ラクション処理部３０では、その３種類の信号にしたが
って拡散性雑音の抑圧処理であるスペクトルサブトラク
ション処理を行う。Next, the direction detecting section 40 and the spectrum subtraction control section 50, which are important points of the present invention, will be described. In the direction detection unit 40, the direction index is calculated using not only the phase difference between the two microphones but also the power ratio of the input signal of each channel, which will be described later in detail, based on frequency analysis such as short-time FFT. To do. The spectral subtraction control unit 50 includes a target sound direction (speaker direction) estimated by the target sound direction estimation unit, which is one of the components of the noise suppression processing device of the speaker tracking microphone array 10, and the directionality detection unit 40. As shown in FIG. 2, based on the directionality index D calculated in step 1, any one of the three types of signals corresponding to the value of the directionality index D is sent to the spectrum subtraction processing section 30, and the spectrum subtraction processing section 30 At 30, the spectral subtraction processing, which is the diffusion noise suppression processing, is performed according to the three types of signals.

【００６１】ここで、スペクトルサブトラクション制御
部５０から出力される３種類の信号とは、“０”と
“１”と“２”の３通りの信号を指し、これらのうち、
信号“０”はほとんど雑音のみの区間であることを表
し、信号“１”は大きな突発性雑音が音声区間に重畳し
ている区間であることを表し、信号“２”はほぼ音声の
みの区間であることを表す。Here, the three types of signals output from the spectrum subtraction control unit 50 refer to three types of signals of "0", "1", and "2".
The signal "0" represents a section containing almost only noise, the signal "1" represents a section in which large sudden noise is superimposed on the speech section, and the signal "2" represents a section containing almost only speech. It means that.

【００６２】まず、初期設定として、例えば、話者方向
の許容範囲を２つのマイクホンの中心から±２０゜、方
向性指標Ｄのしきい値を“１．０”とする（図２のステ
ップＳ２０）。このしきい値は一例であって、実際には
使用環境で実験より調整することが望ましい。First, as initial settings, for example, the permissible range in the speaker direction is set to ± 20 ° from the center of the two microphones, and the threshold value of the directionality index D is set to "1.0" (step S20 in FIG. 2). ). This threshold is an example, and it is actually preferable to adjust it by experimentation in the usage environment.

【００６３】ここで、方向性検出部４０から送られてく
る方向性指標Ｄが、しきい値（１．０）以下か否かを判
定し（図２のステップＳ２１，Ｓ２２，Ｓ２３）、その
結果、しきい値以下であれば、つぎに目的音方向（話者
方向）が設定範囲内かどうかを判定し（図２のステップ
Ｓ２４）、設定範囲内であれば、スペクトルサブトラク
ション処理部３０に信号２”を送る（図２のステップＳ
２５）。また、設定範囲外であれば、信号“０”を送る
（図２のステップＳ２６）。Here, it is determined whether or not the directionality index D sent from the directionality detecting section 40 is less than or equal to the threshold value (1.0) (steps S21, S22, S23 in FIG. 2), and If the result is less than or equal to the threshold value, then it is determined whether the target sound direction (speaker direction) is within the setting range (step S24 in FIG. 2). If it is within the setting range, the spectral subtraction processing unit 30 Send signal 2 "(step S in FIG. 2)
25). If it is outside the set range, the signal "0" is sent (step S26 in FIG. 2).

【００６４】方向性指標Ｄがしきい値以上であり、目的
音方向が設定範囲内であれば（図２のステップＳ２１，
Ｓ２２，Ｓ２３，Ｓ２７）、話者方向から到来する音声
に突発性の雑音が重畳していると判定して信号“１”を
送り（図２のステップＳ２８）、方向性指標Ｄがしきい
値以上であり、目的音方向が設定範囲外であれば（図２
のステップＳ２１，Ｓ２２，Ｓ２３，Ｓ２７）、信号
“０”を送る（図２のステップＳ２２６）。If the directionality index D is equal to or more than the threshold value and the target sound direction is within the set range (step S21 in FIG. 2,
(S22, S23, S27), it is determined that sudden noise is superimposed on the voice coming from the speaker direction, and a signal "1" is sent (step S28 in FIG. 2), and the directionality index D is a threshold value. If the target sound direction is out of the setting range (see FIG. 2).
Of steps S21, S22, S23 and S27), and the signal "0" is sent (step S226 of FIG. 2).

【００６５】スペクトルサブトラクション処理部３０で
は、スペクトルサブトラクション制御部５０からの信号
が“０”の時は、ほとんど雑音区間と見なせるので、最
小重みをかけて、出力信号をカットする。そして、これ
を最終的な音声周波数成分として出力する。In the spectral subtraction processing unit 30, when the signal from the spectral subtraction control unit 50 is "0", it can be regarded almost as a noise section, so that the output signal is cut by applying the minimum weight. Then, this is output as the final audio frequency component.

【００６６】また、スペクトルサブトラクション制御部
５０からの信号が“１”の時は、音声区間に突発性雑音
が重畳していると見なし、第２のビームフォーマからの
出力を雑音成分として扱う２チャネルのスペクトルサブ
トラクション処理を行う。そして、これを最終的な音声
周波数成分として出力する。When the signal from the spectrum subtraction control unit 50 is "1", it is considered that sudden noise is superimposed on the voice section, and the output from the second beamformer is treated as a noise component in two channels. The spectral subtraction process of is performed. Then, this is output as the final audio frequency component.

【００６７】また、スペクトルサブトラクション処理部
３０では、スペクトルサブトラクション制御部５０から
の信号が“２”の時は、音声のみの区間と見なして、第
１のビームフォーマの出力に対し、１チャネルのスペク
トルサブトラクションを行う。そして、これを最終的な
音声周波数成分として出力する。Further, in the spectrum subtraction processing section 30, when the signal from the spectrum subtraction control section 50 is "2", it is considered as a voice-only section, and the spectrum of one channel is output for the output of the first beamformer. Perform subtraction. Then, this is output as the final audio frequency component.

【００６８】なお、別の制御方法として、信号“１”の
時は、大きな突発性雑音が音声に重畳しているため、雑
音区間と見なして信号“０”と同様の処理にしてもよ
い。As another control method, when the signal is "1", a large sudden noise is superposed on the voice, so that it may be regarded as a noise section and may be processed in the same manner as the signal "0".

【００６９】これにより、話者方向の許容範囲を高精度
に設定できる話者追尾機能を確保しつつ、しかも、突発
性雑音、高速移動音源等を抑圧することを可能とするシ
ステムが得られる。As a result, it is possible to obtain a system capable of suppressing the sudden noise, the high-speed moving sound source, etc. while securing the speaker tracking function capable of setting the allowable range of the speaker direction with high accuracy.

【００７０】以上は、本発明の概要であるが、このよう
なシステムを構築するに必要な各構成要素の詳細につい
てふれておく。The above is an outline of the present invention. The details of each component necessary for constructing such a system will be described.

【００７１】＜話者追尾マイクロホンアレイ１０の構成
例１＞話者追尾マイクロホンアレイ１０の構成例につい
て説明する。この例は請求項０Ａ１の内容に相当する。<Configuration Example 1 of Speaker Tracking Microphone Array 10> A configuration example of the speaker tracking microphone array 10 will be described. This example corresponds to the content of claim 0A1.

【００７２】図３は話者追尾マイクロホンアレイ１０の
構成例１としてのシステムの構成例を示すブロック図で
あって、本発明の一実施形態に係る雑音抑圧装置の基本
構成を示すブロック図である。本発明は、マイクロホン
数が２ｃｈ（ｃｈ；チャネル）すなわち、２本と云った
最小の場合でも話者追尾可能とするための技術であるた
め、ここでは２ｃｈで説明するが、３ｃｈ以上となった
場合でも処理の方法は同様である。FIG. 3 is a block diagram showing a configuration example of a system as a configuration example 1 of the speaker tracking microphone array 10, and is a block diagram showing a basic configuration of a noise suppressing device according to an embodiment of the present invention. . The present invention is a technique for enabling speaker tracking even when the number of microphones is 2 ch (ch; channel), that is, a minimum of two microphones. Even in the case, the processing method is the same.

【００７３】図３において、１１は音声入力部、１２は
周波数解析部、１３は第１のビームフォーマ、１４は第
１の入力方向修正部、１５は第２の入力方向修正部、１
６は第２のビームフォーマ、１７は雑音方向推定部、１
８は目的音方向推定部（音声方向推定部）である。In FIG. 3, 11 is a voice input unit, 12 is a frequency analysis unit, 13 is a first beam former, 14 is a first input direction correction unit, 15 is a second input direction correction unit, 1
6 is a second beam former, 17 is a noise direction estimation unit, 1
Reference numeral 8 denotes a target sound direction estimation unit (voice direction estimation unit).

【００７４】これらのうち、音声入力部１１は、例え
ば、音声収集対象である話者の発声した音声（目的音
声）を異なる２箇所以上の位置で受音するためのもので
あり、具体的にはそれぞれ地点を異ならせて設置した２
本のマイクロホンを用いて音声を取り込み、電気信号に
変換するものである。また、周波数分析部１２は、前記
マイクロホンの受音位置に対応する音声信号のチャネル
毎に周波数分析を行って複数チャネルの周波数成分を出
力するものであり、具体的にはここでは第１のマイクロ
ホンのとらえた音声信号（第１チャネルｃｈ１の音声信
号）および第２のマイクロホンのとらえた音声信号（第
２チャネルｃｈ２の音声信号）を、それぞれ別々に高速
フーリエ変換するなどして時間領域の信号成分から周波
数領域の成分のデータにに変換することにより、各チャ
ネル別に周波数スペクトルのデータに変換して出力する
ものである。Of these, the voice input unit 11 is, for example, for receiving the voice (target voice) uttered by the speaker whose voice is to be collected at two or more different positions. 2 were installed at different points
A microphone is used to capture voice and convert it into an electrical signal. The frequency analysis unit 12 performs frequency analysis for each channel of the audio signal corresponding to the sound receiving position of the microphone and outputs frequency components of a plurality of channels. Specifically, here, the first microphone is used. Signal component in the time domain by separately performing fast Fourier transform on the captured audio signal (the audio signal of the first channel ch1) and the audio signal captured by the second microphone (the audio signal of the second channel ch2), respectively. Is converted into frequency domain component data, and converted into frequency spectrum data for each channel and output.

【００７５】第１のビームフォーマ１３は、この周波数
分析部１２からの複数チャネルの周波数成分出力、この
場合、ｃｈ１，ｃｈ２の音声信号を用いて、これより目
的音声の周波数分を抽出するためのものであって、前記
ｃｈ１，ｃｈ２それぞれの周波数成分（周波数スペクト
ルデータ）を用いて適応フィルタ処理により目的の音声
以外の到来雑音の抑圧処理を行うことにより、目的とす
る音源方向からの周波数成分を抽出するといったことを
行う処理手段であり、第２のビームフォーマ１６は、周
波数分析部１２からの複数チャネルの周波数成分出力、
この場合、ｃｈ１，ｃｈ２の音声信号を用いて、これよ
り雑音源方向からの周波数成分を抽出するためのもので
あって、前記ｃｈ１，ｃｈ２それぞれの周波数成分（周
波数スペクトルデータ）を用いて適応フィルタ処理によ
り雑音音源方向からの音声以外の成分の抑圧処理を行う
ことにより、雑音源方向からの周波数スペクトル成分の
データを抽出するといったことを行う処理手段である。The first beam former 13 uses the frequency component outputs of a plurality of channels from the frequency analysis unit 12, in this case, the audio signals of ch1 and ch2, to extract the frequency component of the target audio from this. By using the frequency components of each of ch1 and ch2 (frequency spectrum data) to suppress incoming noise other than the target voice by adaptive filter processing, the frequency components from the target sound source direction can be detected. The second beam former 16 is a processing means for performing extraction, and the second beam former 16 outputs frequency components of a plurality of channels from the frequency analysis unit 12.
In this case, the audio signal of ch1 and ch2 is used to extract the frequency component from the noise source direction, and the adaptive filter is performed by using the frequency component (frequency spectrum data) of each of the ch1 and ch2. It is a processing unit that performs processing of suppressing components other than voice from the noise source direction by processing, thereby extracting data of frequency spectrum components from the noise source direction.

【００７６】また、雑音方向推定部１７は、前記第１の
ビームフォーマ１３で計算されるフィルタ係数から雑音
方向を推定すると云った処理を行うものであって、具体
的には前記第１のビームフォーマ１３の適応フィルタか
ら得られるフィルタリング処理用のフィルタ係数などの
パラメータを用いて雑音方向を推定し、その推定量対応
のデータを出力し、また、目的音方向推定部（音声方向
推定部）１８は、前記第２のビームフォーマ１６で計算
されるフィルタ係数から目的音方向を推定すると云った
処理を行うものであって、具体的には前記第２のビーム
フォーマ１６の適応フィルタで用いられているフィルタ
係数などのパラメータから雑音方向を推定し、その推定
量対応のデータを出力するものである。Further, the noise direction estimation unit 17 performs a process of estimating the noise direction from the filter coefficient calculated by the first beam former 13, and specifically, the first beam The noise direction is estimated using parameters such as filter coefficients for filtering processing obtained from the adaptive filter of the former 13, data corresponding to the estimated amount is output, and a target sound direction estimation unit (speech direction estimation unit) 18 Is a process of estimating the target sound direction from the filter coefficient calculated by the second beamformer 16, and is specifically used by the adaptive filter of the second beamformer 16. The noise direction is estimated from parameters such as existing filter coefficients, and data corresponding to the estimated amount is output.

【００７７】また、第１の入力方向修正部１４は、本来
の目的音方向にビームフォーマの入力方向を修正するた
めのものであって、前記第１のビームフォーマ１３にお
いて、入力対象とする目的音の到来方向である第１の入
力方向を、前記目的音方向推定部１８で推定された目的
音方向に基づいて逐次方向修正するための出力を発生
し、第１のビームフォーマ１３に与えるものである。具
体的には、第１の入力方向修正部１４は、目的音方向推
定部１８の出力する推定量対応のデータを現在の目的と
する音源方向の角度情報αに変換して目標角度情報αと
して第１のビームフォーマ１３に出力するものである。The first input direction correcting section 14 is for correcting the input direction of the beamformer to the original target sound direction, and is an object to be an input target in the first beamformer 13. A first input direction that is a sound arrival direction is generated based on the target sound direction estimated by the target sound direction estimation unit 18, and an output for sequentially correcting the generated direction is generated and given to the first beamformer 13. Is. Specifically, the first input direction correction unit 14 converts the data corresponding to the estimated amount output from the target sound direction estimation unit 18 into the current target sound source direction angle information α to obtain target angle information α. It is output to the first beam former 13.

【００７８】第２の入力方向修正部１５は第２のビーム
フォーマ１６の入力方向を雑音方向に修正するためのも
のであって、前記第２のビームフォーマ１６において、
入力対象とする雑音の到来方向である第２の入力方向
を、前記雑音方向推定部１７で推定された雑音方向に基
づいて逐次方向修正するための出力を発生し、第２のビ
ームフォーマ１４に与えるものである。具体的には、第
２の入力方向修正部１５は、雑音方向推定部１７の出力
する推定量対応のデータを現在の目的とする雑音源方向
の角度情報に変換して目標角度情報αとして第２のビー
ムフォーマ１６に出力するものである。The second input direction correcting section 15 is for correcting the input direction of the second beam former 16 to the noise direction, and in the second beam former 16,
An output for sequentially correcting the second input direction, which is the arrival direction of the noise to be input, based on the noise direction estimated by the noise direction estimation unit 17, is generated, and is output to the second beamformer 14. To give. Specifically, the second input direction correction unit 15 converts the data corresponding to the estimated amount output from the noise direction estimation unit 17 into the current target angle information of the noise source direction to obtain the target angle information α. The beam is output to the second beam former 16.

【００７９】ここでビームフォーマ１３，１６の構成例
を示しておく。Here, a configuration example of the beam formers 13 and 16 will be shown.

【００８０】＜ビームフォーマの構成例＞本発明システ
ムで用いるビームフォーマ１３，１６は、図４（ａ）に
示すような構成となる。すなわち、本発明システムにお
いて用いられるビームフォーマ１３，１６は、入力音声
中から抽出したい対象となる信号成分を得ることができ
るようにするために、抽出したい対象となる信号成分の
到来方向に、ビームフォーマの入力方向を設定するため
の移相部１００と、抽出したい対象となる信号成分の到
来方向以外の方向からの成分を抑圧するビームフォーマ
本体１０１とから構成される。<Example of Configuration of Beamformer> The beamformers 13 and 16 used in the system of the present invention have a configuration as shown in FIG. That is, the beam formers 13 and 16 used in the system of the present invention, in order to be able to obtain the target signal component to be extracted from the input voice, in the arrival direction of the target signal component to be extracted, It comprises a phase shifter 100 for setting the input direction of the former, and a beamformer main body 101 for suppressing components from directions other than the arrival direction of the signal component to be extracted.

【００８１】移相部１００は補正ベクトル生成部１００
ａと乗算手段１００ｂ，１００ｃとから構成され、ビー
ムフォーマ本体１０１は加算手段１０１ａ，１０１ｂ，
１０１ｃと適応フィルタ１０１ｄとから構成される。The phase shifter 100 is the correction vector generator 100.
a and multiplication means 100b and 100c, the beamformer main body 101 includes addition means 101a and 101b,
101c and an adaptive filter 101d.

【００８２】補正ベクトル生成部１００ａは入力方向修
正部１４または１５からの角度情報αを入力方向の情報
として受けて、これよりα対応の補正ベクトルを生成す
るものであり、乗算手段１００ｂは周波数分析部１２か
ら出力されるｃｈ１の周波数スペクトル成分のデータに
対して補正ベクトル分を乗算して出力するものであり、
乗算手段１００ｃは周波数分析部１２から出力されるｃ
ｈ２の周波数スペクトル成分のデータに対して補正ベク
トル分を乗算して出力するものである。The correction vector generation unit 100a receives the angle information α from the input direction correction unit 14 or 15 as the input direction information and generates a correction vector corresponding to α from this, and the multiplication means 100b performs the frequency analysis. The data of the frequency spectrum component of ch1 output from the unit 12 is multiplied by the correction vector and output.
The multiplication means 100c outputs c from the frequency analysis unit 12.
The data of the frequency spectrum component of h2 is multiplied by the correction vector and output.

【００８３】また、加算手段１０１ａは乗算手段１００
ｂの出力と加算手段１００ｃの出力を加算して出力する
ものであり、加算手段１０１ｂは乗算手段１００ｂの出
力と加算手段１００ｃの出力の差分を出力するものであ
り、加算手段１０１ｃは加算手段１０１ａの出力に対す
る適応フィルタ１０１ｄの出力の差分をビームフォーマ
の出力として出力するものであり、適応フィルタ１０１
ｄは加算手段１０１ｂの出力に対してフィルタリング演
算処理して出力するためのデジタルフィルタであって、
加算手段１０１ｃの出力が最小となるようにフィルタ係
数（パラメータ）が逐次変更される構成である。The adding means 101a is the multiplying means 100.
The output of b and the output of the addition means 100c are added and output, the addition means 101b outputs the difference between the output of the multiplication means 100b and the output of the addition means 100c, and the addition means 101c is the addition means 101a. Is output as the output of the beamformer.
d is a digital filter for performing filtering calculation processing on the output of the adding means 101b and outputting the result.
The filter coefficient (parameter) is sequentially changed so that the output of the adding means 101c becomes the minimum.

【００８４】ここで、本例ではマイクロホン構成が２
本、すなわち、第１及び第２のマイクロホンｍ１，ｍ２
を用いる収集音声２チャネル（ｃｈ１，ｃｈ２）構成の
システムとしており、この場合、ビームフォーマの入力
方向の設定とは、図４（ｂ）に示すように、入力対象の
存在する方向からの音声信号が等価的に同時に両マイク
ロホンｍ１，ｍ２に到着したと見做せるように、ｃｈ
１，ｃｈ２の２つの音声チャネルの周波数成分に対して
遅延を施し、位相を揃える（整相）ようにすることを指
す。これは、図４の構成の場合、入力方向修正部１４，
１５の出力する角度情報α対応に移相部１００で移相調
整することによって実現している。In this example, the microphone configuration is 2
Book, that is, the first and second microphones m1 and m2
In this case, the input direction of the beamformer is set to the audio signal from the direction in which the input target exists, as shown in FIG. 4B. , So that they can be considered equivalently arriving at both microphones m1 and m2 at the same time.
This refers to delaying the frequency components of the two audio channels of 1 and ch2 so that their phases are aligned (phased). This is because in the case of the configuration of FIG.
It is realized by adjusting the phase shift in the phase shift unit 100 in accordance with the angle information α output by 15.

【００８５】すなわち、図４の構成の場合、移相部１０
０は補正したい入力方向（角度情報α）対応の補正ベク
トルを補正ベクトル生成部１００ａで生成するようにし
ており、この補正ベクトルを１ｃｈ，２ｃｈの各チャネ
ルの信号にそれぞれ乗算する乗算手段１００ｂ，１００
ｃで乗算する構成とした移相部１００により次のように
して位相を揃える。That is, in the case of the configuration of FIG.
For 0, the correction vector corresponding to the input direction (angle information α) desired to be corrected is generated by the correction vector generation unit 100a, and the multiplication means 100b and 100 for multiplying the signals of the channels of 1ch and 2ch respectively by this correction vector.
The phase shift unit 100 configured to multiply by c aligns the phases as follows.

【００８６】例えば、図４（ｂ）に符号ｍ１，ｍ２を付
して示すような無指向性マイクロホン配置であって、Ｐ
１点に居る目的音源である話者が、あたかもＰ２点に居
るかのように信号に位相補正することを考えてみる。こ
のような場合には、距離ｄだけ離れた第１のマイクロホ
ンｍ１で検出した話者音声信号（ｃｈ１）の位相と第２
のマイクロホンｍ２で検出した話者音声信号（ｃｈ２）
の位相が同じになるように、第１のマイクロホンｍ１の
話者音声信号（ｃｈ１）に伝搬時間差τ τ＝ｒ・ｃ＝ｒ・sinα ｒ＝ｄ・sinα に相当する複素数Ｗ１Ｗ１＝（ cos ｊωτ，sin ｊωτ）の複素共役をかける。ここで、ｃは音速、ｄはマイクロ
ホン間距離、αはマイクロホンｍ１から見た目的音の音
源である話者の移動した角度、ｊは虚数、ωは角周波数
である。For example, in an omnidirectional microphone arrangement as shown by adding symbols m1 and m2 in FIG.
Consider that the speaker, which is the target sound source at one point, corrects the phase of the signal as if he / she were at point P2. In such a case, the phase of the speaker voice signal (ch1) detected by the first microphone m1 separated by the distance d and the second phase
Speech signal (ch2) detected by microphone m2
So as to have the same phase, the complex number W1 W1 = (cos jωτ in the speaker voice signal (ch1) of the first microphone m1 corresponding to the propagation time difference τ τ = rc = rsinα r = dsinα , Sin j ω τ) complex conjugate. Here, c is the speed of sound, d is the distance between the microphones, α is the moving angle of the speaker as the sound source of the target sound seen from the microphone m1, j is the imaginary number, and ω is the angular frequency.

【００８７】つまり、Ｗ１の複素共役をかけたことによ
り、αなる角度に移動した目的音源の音声について注目
すれば、第１のマイクロホンｍ１でとらえた信号（ｃｈ
１）が、第２のマイクロホンｍ２でとらえた信号と同位
相となるように移相制御したことになる。That is, focusing on the voice of the target sound source moved to the angle α by applying the complex conjugate of W1, the signal (ch) captured by the first microphone m1
1) means that the phase shift control is performed so that the signal has the same phase as the signal captured by the second microphone m2.

【００８８】尚、第２のマイクロホンｍ２の信号（ｃｈ
２）には、複素数Ｗ２＝（１，０）の複素共役をかける
ものとする。つまり、これは第２のマイクロホンｍ２の
信号（ｃｈ２）には、角度補正をしないことを意味す
る。The signal of the second microphone m2 (ch
It is assumed that the complex conjugate of the complex number W2 = (1,0) is applied to 2). That is, this means that the signal (ch2) of the second microphone m2 is not subjected to angle correction.

【００８９】ここで、複素数Ｗ１と複素数Ｗ２を並べた
ベクトル｛Ｗ１，Ｗ２｝は、一般に方向ベクトルと呼ば
れ、この｛Ｗ１，Ｗ２｝における複素共役のベクトル共
役｛Ｗ１*，Ｗ２*｝を、補正ベクトルと呼ぶ。Here, the vector {W1, W2} in which the complex numbers W1 and W2 are arranged is generally called a direction vector, and the vector conjugate {W1 *, W2 *} of the complex conjugate in {W1, W2} is It is called a correction vector.

【００９０】角度情報α対応に補正ベクトルを生成さ
せ、ｃｈ１，ｃｈ２の周波数スペクトル成分に対してこ
の補正ベクトルを乗算すれば、第１のマイクロホンｍ１
の出力は、音源がＰ１よりＰ２に移動したにもかかわら
ず、第２のマイクロホンｍ２の位相と同じになるように
補正されたことになり、第１のマイクロホンｍ１に関す
る限り、第２のマイクロホンｍ１，ｍ２のＰ２位置音源
に対する距離はあたかも等しいかたちになる。If a correction vector is generated corresponding to the angle information α and the frequency spectrum components of ch1 and ch2 are multiplied by this correction vector, the first microphone m1
The output of is corrected so as to have the same phase as that of the second microphone m2, even though the sound source has moved from P1 to P2. As far as the first microphone m1 is concerned, the second microphone m1 , M2 have the same distance to the P2 position sound source.

【００９１】本実施例では、ビームフォーマは２つある
が、これら２つあるビームフォーマのうち、第１のビー
ムフォーマ１３はその移相部１００により目的音の音源
方向を入力対象方向とするように、ｃｈ１（もしくはｃ
ｈ２）の周波数成分に上述の手法で遅延を施し、第２の
ビームフォーマ１６はその移相部１００により雑音源方
向を入力対象方向とするように、ｃｈ１（もしくはｃｈ
２）の周波数成分に上述の手法で遅延を施してそれぞれ
両者の位相を揃える。ただし、目的音Ｓの到来方向以外
からの音成分、すなわち、雑音成分Ｎについては第１お
よび第２のマイクロホンｍ１，ｍ２ともに位相は全く無
修正であるから、第１のマイクロホンｍ１と第２のマイ
クロホンｍ２で検出されるタイミングに時間差がある。In this embodiment, there are two beam formers. Of these two beam formers, the first beam former 13 uses the phase shifter 100 to set the sound source direction of the target sound as the input target direction. , Ch1 (or c
The frequency component of h2) is delayed by the above-described method, and the second beam former 16 uses the phase shifter 100 to set ch1 (or ch1) so that the noise source direction is the input target direction.
The frequency component of 2) is delayed by the above method to align the phases of both. However, with respect to the sound component from the direction other than the arrival direction of the target sound S, that is, the noise component N, the phases of the first and second microphones m1 and m2 are completely uncorrected, so that the first microphone m1 and the second microphone m There is a time difference in the timing detected by the microphone m2.

【００９２】このように移相部１００により、目的音方
向の音源からの検出される音声信号について位相修正し
た第１のマイクロホンｍ１の出力（目的音声成分Ｓと雑
音分Ｎからなるｃｈ１の周波数スペクトルデータ）およ
び修正の加えられない第２のマイクロホンｍ２の出力
（目的音声成分Ｓと雑音分Ｎ′からなるｃｈ２の周波数
スペクトルデータ）は、それぞれ加算手段１０１ａ，１
０１ｂに入力される。そして、加算手段１０１ａではｃ
ｈ１の出力とｃｈ２の出力が加算されることによって目
的音声Ｓの２倍の信号と雑音成分Ｎ＋Ｎ′についてのパ
ワー成分が求められ、加算手段１０１ｂではｃｈ１の出
力（Ｓ＋Ｎ）とｃｈ２の出力（Ｓ＋Ｎ′）の差分（（Ｓ
＋Ｎ）−（Ｓ＋Ｎ′）＝Ｎ−Ｎ′）、つまり、ノイズ分
のパワー成分が求められる。そして、加算手段１０１ｃ
で加算手段１０１ａの出力に対する適応フィルタ１０１
ｄの出力の差分を求め、これをビームフォーマの出力と
すると共に、適応フィルタ１０１ｄにフィードバックす
る。In this way, the output of the first microphone m1 whose phase is corrected by the phase shifter 100 with respect to the voice signal detected from the sound source in the target sound direction (the frequency spectrum of ch1 composed of the target voice component S and the noise component N) Data) and the uncorrected output of the second microphone m2 (the frequency spectrum data of ch2 consisting of the target voice component S and the noise component N ') are respectively added by the adding means 101a, 1a.
01b is input. Then, in the adding means 101a, c
By adding the output of h1 and the output of ch2, the power component of the signal of twice the target voice S and the noise component N + N 'is obtained, and the adding means 101b outputs ch1 (S + N) and ch2 (S + N). ′) Difference ((S
+ N)-(S + N ') = NN'), that is, the power component of noise is obtained. Then, the adding means 101c
Then, the adaptive filter 101 for the output of the adding means 101a
The difference between the outputs of d is calculated, and this difference is used as the output of the beamformer and fed back to the adaptive filter 101d.

【００９３】適応フィルタ１０１ｄは加算手段１０１ｂ
の出力に対して現在の探査方向対応の方向から到来した
音の成分の周波数スペクトルが抽出されるようフィルタ
リング演算処理して出力するためのデジタルフィルタで
あり、逐次、角度１°刻みに到来信号の探査角度を可変
していて、入力される信号方向に探査角度が一致したと
き最大の出力を出す。従って、到来信号の入射方向と探
査角度が一致すれば適応フィルタ１０１ｄの出力（Ｎ−
Ｎ′）は最大になる。そして、適応フィルタ１０１ｄの
出力（Ｎ−Ｎ′）は雑音成分のパワーであるから、それ
が最大のときの出力を加算手段１０１ｃに与え、加算手
段１０１ａからの出力（２Ｓ＋Ｎ＋Ｎ′）から差し引け
ば、雑音成分Ｎが最大限キャンセルされて雑音抑圧が成
される。故に、この状態のときは、加算手段１０１ｃの
出力は最小である。The adaptive filter 101d is addition means 101b.
Is a digital filter for performing a filtering calculation process so as to extract the frequency spectrum of the sound component arriving from the direction corresponding to the current search direction with respect to the output of The search angle is variable, and the maximum output is output when the search angle matches the input signal direction. Therefore, if the incident direction of the incoming signal and the search angle match, the output of the adaptive filter 101d (N-
N ') is maximum. Since the output (N-N ') of the adaptive filter 101d is the power of the noise component, the output when it is maximum is given to the adding means 101c and subtracted from the output (2S + N + N') from the adding means 101a. , The noise component N is maximally canceled to suppress the noise. Therefore, in this state, the output of the adding means 101c is the minimum.

【００９４】そのため、適応フィルタ１０１ｄは加算手
段１０１ｃの出力が最小となるように角度１°刻みの信
号到来方向探査角度（角度１°刻みの方向別感度）とフ
ィルタ係数（パラメータ）を逐次変更させることによ
り、到来信号の入射方向と探査角度（到来信号の入射方
向とその方向に対する感度）が一致することになるか
ら、適応フィルタ１０１ｄはこれらを制御しつつ、加算
手段１０１ｃの出力が最小となるようにする。Therefore, the adaptive filter 101d sequentially changes the signal arrival direction search angle in 1 ° increments (sensitivity by direction in 1 ° increments) and the filter coefficient (parameter) so that the output of the adding means 101c is minimized. As a result, the incident direction of the incoming signal and the search angle (the incoming direction of the incoming signal and the sensitivity to the direction) match, so that the adaptive filter 101d controls them and the output of the adding means 101c becomes the minimum. To do so.

【００９５】つまり、この制御の結果、目的方向からの
音声成分をビームフォーマは抽出できることになる。ま
た、雑音成分を目的音として抽出する場合には、上述の
目的音を雑音と見做すようにしたかたちで、上記制御を
施すようにすればよい。That is, as a result of this control, the beam former can extract the voice component from the target direction. Further, when the noise component is extracted as the target sound, the above-mentioned control may be performed in such a manner that the above-mentioned target sound is regarded as noise.

【００９６】なお、ビームフォーマ本体１０１に関して
は、一般化サイドローブキャンセラ（ＧＳＣ）の他に、
フロスト型ビームフォーマなど種々のものが上述同様の
考え方で適用可能であり、従って、本発明では特に限定
はされない。Regarding the beamformer main body 101, in addition to the generalized sidelobe canceller (GSC),
Various types such as a frost type beam former can be applied in the same idea as described above, and therefore, the present invention is not particularly limited.

【００９７】このような構成の本システムの作用を説明
する。本システムは、目的音の音声周波数成分と雑音周
波数成分とを別々に抽出出力する構成としていることを
特徴としている。The operation of the present system having such a configuration will be described. This system is characterized in that the voice frequency component and the noise frequency component of the target sound are separately extracted and output.

【００９８】まず、複数のマイクロホンを持つ音声入力
部１１、この例では第１及び第２の計２本のマイクロホ
ンｍ１，ｍ２を持つ音声入力部１１でｃｈ１，ｃｈ２の
音声を取り込む。そして、この音声入力部１１から入力
された２チャネル分の音声の信号ｃｈ１，ｃｈ２（すな
わち、第１チャネルｃｈ１は第１のマイクロホンｍ１か
らの音声、第２チャネルｃｈ２は第２のマイクロホンｍ
２からの音声に該当する）は、周波数分析部１２に送ら
れ、ここで例えば高速フーリエ変換（ＦＦＴ）等の処理
を行うことによって、それぞれのチャネル別に周波数成
分（周波数スペクトル）が求められる。First, the voice input unit 11 having a plurality of microphones, in this example, the voice input unit 11 having a total of two microphones m1 and m2, captures the voices of ch1 and ch2. The two channels of audio signals ch1 and ch2 input from the audio input unit 11 (that is, the first channel ch1 is the audio from the first microphone m1 and the second channel ch2 is the second microphone m).
(Corresponding to the voice from 2) is sent to the frequency analysis unit 12, where a frequency component (frequency spectrum) is obtained for each channel by performing processing such as fast Fourier transform (FFT).

【００９９】周波数分析部１２でそれぞれ求められたチ
ャネル別の周波数成分は、それぞれ第１及び第２のビー
ムフォーマ１３，１６に与えられる。The frequency components for each channel obtained by the frequency analysis unit 12 are given to the first and second beam formers 13 and 16, respectively.

【０１００】第１のビームフォーマ１３では、２チャネ
ル分の周波数成分入力について、目的音の方向対応に位
相を合わせた上で、周波数領域の適応フィルタにより上
述のようにして処理することで雑音を抑圧し、目的音の
方向の周波数成分を出力する。In the first beam former 13, the noise components are processed by adjusting the phases of the frequency component inputs for two channels according to the direction of the target sound and then processing the same as described above by the adaptive filter in the frequency domain. Suppress and output the frequency component in the direction of the target sound.

【０１０１】ここで、具体的に説明すると第１の入力方
向修正部１４は第１のビームフォーマ１３に対して次の
ような角度情報（α）を与える。つまり、第１の入力方
向修正部１４は、与えられる音声方向推定部１８からの
出力を用い、目的音の方向があたかもマイクロホンの正
面方向となるよう、上記２チャネルの周波数成分の入力
位相を整えるに必要な角度情報（α）を入力方向修正量
として第１のビームフォーマ１３に対して与える。Here, specifically, the first input direction correction section 14 gives the following angle information (α) to the first beam former 13. That is, the first input direction correction unit 14 adjusts the input phase of the frequency components of the two channels by using the output from the given voice direction estimation unit 18 so that the direction of the target sound is the front direction of the microphone. The angle information (α) required for the first beam former 13 is given to the first beam former 13 as an input direction correction amount.

【０１０２】この結果、第１のビームフォーマ１３はこ
の修正量（α）対応に目的音方向を修正し、当該目的音
方向以外の方向から到来する音声を抑圧させるようにす
ることで、雑音成分を抑圧し、目的音を抽出する。As a result, the first beam former 13 corrects the target sound direction in accordance with the correction amount (α), and suppresses the voice coming from the direction other than the target sound direction, thereby reducing the noise component. Is suppressed and the target sound is extracted.

【０１０３】すなわち、目的音方向推定部１８は雑音成
分を抽出するための第２のビームフォーマ１６における
適応フィルタのパラメータを用いて雑音源方向を知り、
それを反映させた出力を出し、第１の入力方向修正部１
４ではこの目的音方向推定部１８からの出力対応に入力
方向修正量（α）を発生してこの修正量（α）対応に第
１のビームフォーマ１３における目的音方向を修正し、
これによって第１のビームフォーマ１３に当該目的音方
向以外の方向から到来する音声を抑圧させるようにする
ことで、雑音成分を抑圧し、目的音を抽出する。That is, the target sound direction estimating unit 18 knows the noise source direction by using the parameter of the adaptive filter in the second beam former 16 for extracting the noise component,
An output reflecting that is output, and the first input direction correction unit 1
In 4, the input direction correction amount (α) is generated corresponding to the output from the target sound direction estimation unit 18, and the target sound direction in the first beam former 13 is corrected corresponding to the correction amount (α),
As a result, the first beam former 13 suppresses the voice coming from a direction other than the target sound direction, thereby suppressing the noise component and extracting the target sound.

【０１０４】つまり、第２のビームフォーマ１６の場
合、雑音が目的音であるから、雑音に位相を合わせてい
る。その結果、第２のビームフォーマ１６では話者の音
源は雑音源として扱われ、ビームフォーマの内蔵する適
応フィルタは話者音源からの音を抽出する処理をするこ
とになるので、当該第２のビームフォーマ１６の適応フ
ィルタのパラメータからは話者音源の方向を反映した出
力が得られる。従って、目的音方向推定部１８により、
第２のビームフォーマ１６における適応フィルタのパラ
メータを用いて雑音源方向を知れば、それは目的音であ
る話者音源の方向を反映させたものである。従って、目
的音方向推定部１８により、第２のビームフォーマ１６
における適応フィルタのパラメータを反映させた出力を
出し、第１の入力方向修正部１４でこの目的音方向推定
部１８からの出力対応に入力方向修正量（α）を発生
し、この修正量対応に第１のビームフォーマ１３におけ
る目的音方向を修正すれば、第１のビームフォーマ１３
に当該目的音方向以外の方向から到来する音声を抑圧さ
せることができる。That is, in the case of the second beam former 16, since the noise is the target sound, the phase is matched with the noise. As a result, the sound source of the speaker is treated as a noise source by the second beamformer 16, and the adaptive filter incorporated in the beamformer carries out the process of extracting the sound from the speaker sound source. An output reflecting the direction of the speaker sound source is obtained from the parameter of the adaptive filter of the beam former 16. Therefore, by the target sound direction estimation unit 18,
If the noise source direction is known by using the parameter of the adaptive filter in the second beam former 16, it reflects the direction of the speaker sound source which is the target sound. Therefore, the target sound direction estimation unit 18 causes the second beam former 16 to
An output reflecting the parameter of the adaptive filter in is output, and the first input direction correction unit 14 generates an input direction correction amount (α) corresponding to the output from the target sound direction estimation unit 18, and corresponds to this correction amount. If the target sound direction in the first beam former 13 is corrected, the first beam former 13
In addition, it is possible to suppress a voice coming from a direction other than the target sound direction.

【０１０５】また、第２のビームフォーマ１６では、２
チャネル分の周波数成分入力に対して、周波数領域の適
応フィルタにより目的音を抑圧し、雑音の方向の周波数
成分を出力する。ここでは、具体的には雑音の方向をマ
イクロホンの正面と仮定し、２つのマイクロホンに対し
て雑音が同時に到着したと見做せるように、雑音方向推
定部１７からの出力を用いて第２の入力方向修正部５で
位相を整える操作（整相）を行う。Further, in the second beam former 16, 2
For the frequency component input for the channel, the target sound is suppressed by the adaptive filter in the frequency domain, and the frequency component in the noise direction is output. Here, specifically, assuming that the direction of the noise is in front of the microphones, it is possible to use the output from the noise direction estimating unit 17 so that the noises arrive at the two microphones at the same time. The input direction correction unit 5 performs an operation for adjusting the phase (phase adjustment).

【０１０６】すなわち、雑音方向推定部１７では、話者
音声成分を抽出するための第１のビームフォーマ１３に
おける適応フィルタのパラメータを用いて雑音音源方向
を知り、それを反映させた出力を出し、第２の入力方向
修正部１５では雑音方向推定部１７からの出力対応に入
力方向修正量（α）を発生させて第２のビームフォーマ
１６に与えることによって、当該第２のビームフォーマ
１６に当該修正量対応に雑音方向を修正させるように
し、この方向以外の方向から到来する音声を抑圧するこ
とで雑音成分のみを抽出する。That is, the noise direction estimation unit 17 knows the noise source direction by using the parameter of the adaptive filter in the first beam former 13 for extracting the speaker voice component, and outputs the reflected noise direction. The second input direction correction unit 15 generates an input direction correction amount (α) corresponding to the output from the noise direction estimation unit 17 and supplies the input direction correction amount (α) to the second beam former 16. Only the noise component is extracted by correcting the noise direction according to the correction amount and suppressing the voice coming from directions other than this direction.

【０１０７】ここで、雑音方向推定部１７では、第１の
ビームフォーマ１３の適応フィルタから雑音方向を推定
し、目的音方向推定部１８では、第２のビームフォーマ
１６の適応フィルタから目的音方向を推定する。Here, the noise direction estimation unit 17 estimates the noise direction from the adaptive filter of the first beam former 13, and the target sound direction estimation unit 18 uses the adaptive filter of the second beam former 16 to determine the target sound direction. To estimate.

【０１０８】なお、これらの処理は、例えば、８［ｍse
c］等の短い固定時間毎に行われる。以降固定時間をフ
レームと呼ぶ。Note that these processes are performed by, for example, 8 [mse
c] etc. every short fixed time. Hereinafter, the fixed time is called a frame.

【０１０９】このようにして、第１のビームフォーマ１
３により、目的音（話者）の音声成分を抽出することが
でき、また、第２のビームフォーマ１６により、雑音成
分を抽出することができる。In this way, the first beam former 1
3 makes it possible to extract the voice component of the target sound (speaker), and the second beam former 16 makes it possible to extract the noise component.

【０１１０】本装置の設置環境が、静かな会議室であ
り、この会議室にテレビ会議システム設置して当該テレ
ビ会議システムの話者音声抽出のために使用していると
するならば、除去しなければならない雑音と云っても、
そう問題のある大きな妨害音ではないと考えられるの
で、このような場合、第１のビームフォーマ１３によ
り、抽出された目的音（話者）の成分を逆フーリエ変換
して時間領域に戻すことで音声信号に戻し、これをスピ
ーカなどで音声として出力させたり、送信するなどすれ
ば、低雑音化された話者音声として利用できる。If the environment in which the apparatus is installed is a quiet conference room and the video conference system is installed in this conference room and is used for extracting the speaker voice of the video conference system, remove it. Even if it is the noise that must be
Since it is considered that the problem is not a large disturbing sound, in such a case, the first beamformer 13 performs an inverse Fourier transform on the extracted target sound (speaker) component and returns it to the time domain. By returning to a voice signal and outputting it as voice with a speaker or transmitting it, it can be used as a speaker voice with reduced noise.

【０１１１】ここで、方向推定部１７，１８の処理手順
について触れておく。Here, the processing procedure of the direction estimating units 17 and 18 will be touched upon.

【０１１２】［方向推定部の処理手順］図５に方向推定
部１７，１８の処理手順を示す。[Processing Procedure of Direction Estimating Unit] FIG. 5 shows a processing procedure of the direction estimating units 17 and 18.

【０１１３】この処理はフレーム毎に行われる。まず、
初期設定をする（ステップＳ１）。この初期設定内容と
しては図５に点線枠で囲んで示してあるように、“目的
音の追尾範囲”を“０゜±θｒ（例えば、２０゜）”と
し、それ以外の範囲を雑音の探索範囲として設定する。This processing is performed for each frame. First,
Initial settings are made (step S1). As the contents of this initial setting, as shown by enclosing with a dotted line frame in FIG. 5, the “target sound tracking range” is set to “0 ° ± θr (for example, 20 °)”, and the other range is searched for noise. Set as a range.

【０１１４】初期設定が終わったならば、次にステップ
Ｓ２の処理に移る。このステップＳ２では方向ベクトル
を生成する処理を行う。そして、方向別感度計算を行っ
た後、方向別感度周波数累積を行う（ステップＳ３，Ｓ
４）。When the initial setting is completed, the process proceeds to step S2. In this step S2, a process of generating a direction vector is performed. Then, after the direction-specific sensitivity calculation is performed, the direction-specific sensitivity frequency is accumulated (steps S3 and S).
4).

【０１１５】そして、これを全ての周波数と方向につい
て、実施した後、最小値であるものを求めて、その最小
値となった累積値を持つものの方向を信号到来方向とす
る（ステップＳ５，Ｓ６）。Then, after carrying out this for all frequencies and directions, the one having the minimum value is obtained, and the direction of the one having the minimum accumulated value is set as the signal arrival direction (steps S5 and S6). ).

【０１１６】すなわち、具体的にはステップＳ２からＳ
４においては、フィルタ係数Ｗ（ｋ）と方向べクトルＳ
（ｋ，θ）との内積を各周波数成分毎に１゜刻みで所定
の範囲の方向について計算し、対応する方向への感度を
求め、次に、全周波数成分についてその感度を加算する
と云う処理を行う。そして、ステップＳ７，Ｓ８におい
ては、全周波数成分についてその感度を加算した結果と
して得られる各方向別の累積値のうち、その値が最小値
である方向を、信号到来方向とすると云う処理をする。That is, specifically, steps S2 to S
4, the filter coefficient W (k) and the directional vector S
A process of calculating the inner product with (k, θ) for each frequency component in steps of 1 ° in a predetermined range, obtaining the sensitivity in the corresponding direction, and then adding the sensitivity for all frequency components. I do. Then, in steps S7 and S8, of the cumulative values for each direction obtained as a result of adding the sensitivities of all frequency components, the direction in which the value is the minimum value is called the signal arrival direction. .

【０１１７】この図５に示した処理手順は、雑音方向推
定部１７および目的音推定部１８ともに同様のものとな
る。The processing procedure shown in FIG. 5 is the same for both the noise direction estimating section 17 and the target sound estimating section 18.

【０１１８】このようにして、雑音方向推定部１７は雑
音方向の推定を行い、また、目的音推定部１８は目的音
方向の推定を行う。そして、この推定結果はそれぞれの
対応する入力方向修正部１４，１５に与える。In this way, the noise direction estimation unit 17 estimates the noise direction, and the target sound estimation unit 18 estimates the target sound direction. Then, this estimation result is given to the corresponding input direction correction units 14 and 15.

【０１１９】雑音方向の推定結果を受け取った第１の入
力方向修正部１４は、前フレームまでの入力方向と現フ
レームの方向推定結果を平均化し、新たな入力方向を計
算してビームフォーマの移相部１００へ出力し、また、
目的音推定結果を受け取った第２の入力方向修正部１５
もまた、前フレームまでの入力方向と現フレームの方向
推定結果を平均化し、新たな入力方向を計算してビーム
フォーマの移相部１００へ出力する。The first input direction correction unit 14 having received the noise direction estimation result averages the direction estimation results of the input direction up to the previous frame and the current frame, calculates a new input direction, and shifts the beamformer. Output to the phase section 100,
Second input direction correction unit 15 that has received the target sound estimation result
Also, the input direction up to the previous frame and the direction estimation result of the current frame are averaged, a new input direction is calculated, and the result is output to the phase shifter 100 of the beamformer.

【０１２０】平均化は例えば、係数βを用いて次式のよ
うに行う。The averaging is performed, for example, using the coefficient β as in the following equation.

【０１２１】θ１（ｎ）＝θ１（ｎ−１）・（１−
α）＋Ｅ（ｎ）・β ここで、θ１は音の入力方向、ｎは処理フレームの番
号、Ｅは現フレームの方向推定結果である。なお、係数
βはビームフォーマの出力パワーに基づいて可変にして
もよい。Θ1 (n) = θ1 (n−1) · (1−
α) + E (n) · β where θ1 is the input direction of the sound, n is the number of the processing frame, and E is the direction estimation result of the current frame. The coefficient β may be variable based on the output power of the beam former.

【０１２２】ビームフォーマがＧＳＣである場合に、従
来、方向推定の際、時間領域のフィルタ係数から周波数
領域への変換が必要であったが、本発明ではＧＳＣの適
応フィルタが周波数スペクトルに対して方向性感度を以
てフィルタ演算処理し、目的方向外の成分を抽出すると
云った処理をするものを用いており、フィルタ演算処理
に使用するフィルタ係数は、もともと周波数領域で得ら
れるため、従来のように、時間領域のフィルタ係数から
周波数領域への変換と云う処理は不要となる。従って、
本発明システムではＧＳＣは使用していても、時間領域
のフィルタ係数から周波数領域への変換が不要である
分、処理の高速化が可能となる。When the beam former is GSC, conventionally, it is necessary to transform the filter coefficient in the time domain into the frequency domain in the direction estimation, but in the present invention, the adaptive filter of GSC is applied to the frequency spectrum. The filter coefficient used for the filter calculation processing is originally obtained in the frequency domain. The process of transforming the filter coefficient in the time domain into the frequency domain is unnecessary. Therefore,
Even if GSC is used in the system of the present invention, it is possible to speed up the processing because the conversion from the filter coefficient in the time domain to the frequency domain is unnecessary.

【０１２３】［全体の処理手順］図６にシステムの全体
の処理手順を示す。この処理はフレーム毎に行われる。[Overall Processing Procedure] FIG. 6 shows the overall processing procedure of the system. This process is performed for each frame.

【０１２４】まず、初期設定する（ステップＳ１１）。
初期設定内容としては、目的音方向の追尾範囲を０゜±
θｒ（例えばθｒ＝２０゜）とし、雑音方向推定部１７
の探索範囲を θｒ＜ φ１＜ 180゜−θｒ， −180゜＋θｒ＜ φ１＜−θｒとし、目的音方向推定部１８の探索範囲を −θｒ＜ φ２＜ θｒとする。First, initialization is performed (step S11).
As the initial setting content, the tracking range in the direction of the target sound is 0 ° ±
θr (for example, θr = 20 °), and the noise direction estimation unit 17
Is set to θr <φ1 <180 ° −θr, −180 ° + θr <φ1 <−θr, and the search range of the target sound direction estimation unit 18 is set to −θr <φ2 <θr.

【０１２５】そして、目的音の入力方向の初期値をθ１
＝０゜、雑音の入力方向の初期値をθ２＝９０°とす
る。Then, the initial value of the input direction of the target sound is θ1.
= 0 °, and the initial value of the noise input direction is θ2 = 90 °.

【０１２６】初期設定が済んだならば、まず、第１のビ
ームフォーマ１３の処理を行い（ステップＳ１２）、雑
音方向を推定し（ステップＳ１３）、雑音方向がφ２の
範囲内であれば、第２のビームフォーマ１６の入力方向
を修正し（ステップＳ１４，Ｓ１５）、そうでなければ
修正しない（ステップＳ１４）。When the initial setting is completed, first, the process of the first beam former 13 is performed (step S12), the noise direction is estimated (step S13), and if the noise direction is within the range of φ2, the The input direction of the second beam former 16 is corrected (steps S14 and S15), and otherwise it is not corrected (step S14).

【０１２７】次に，第２のビームフォーマ１６の処理に
進み（ステップＳ１６）、目的音の方向を推定する（ス
テップＳ１７）。そして、この推定した目的音の方向が
φ１の範囲内ならば、第１のビームフォーマ１３の入力
方向を修正し（ステップＳ１８，Ｓ１９）、そうでなけ
れば何もせずに、次のフレームの処理に移る。Next, the process proceeds to the second beam former 16 (step S16), and the direction of the target sound is estimated (step S17). Then, if the estimated direction of the target sound is within the range of φ1, the input direction of the first beam former 13 is corrected (steps S18 and S19), and otherwise, the next frame is processed without doing anything. Move on to.

【０１２８】以上の例においては、ビームフォーマとし
て周波数領域で動作するビームフォーマを用いるように
したことを特徴としており、これによって計算量を大幅
に削減することができるようにしている。The above example is characterized in that the beamformer operating in the frequency domain is used as the beamformer, which makes it possible to greatly reduce the amount of calculation.

【０１２９】すなわち、話者の発声した音声を少なくと
も異なる２箇所以上の位置で受音する音声入力手段と、
前記受音位置に対応する音声信号のチャネル毎に周波数
分析を行って複数チャネルの周波数成分を出力する周波
数分析手段と、この周波数分析手段にて得られる前記複
数チャネルの周波数成分について、所望方向外の感度が
低くなるように計算したフィルタ係数を用いての適応フ
ィルタ処理を施すことにより前記話者方向からの音声以
外の音声を抑圧する到来雑音抑圧処理を行い、目的音声
成分を得る第１のビームフォーマ処理手段と、前記周波
数分析手段にて得られる前記複数チャネルの周波数成分
について、所望方向外の感度が低くなるように計算した
フィルタ係数を用いての適応フィルタ処理を施すことに
より前記話者方向からの音声を抑圧し、雑音成分を得る
第２のビームフォーマ処理手段と、前記第１のビームフ
ォーマ処理手段で計算されるフィルタ係数から雑音方向
を推定する雑音方向推定手段と、前記第２のビームフォ
ーマ処理手段で計算されるフィルタ係数から目的音方向
を推定する目的音方向推定手段と、前記第１のビームフ
ォーマにおいて入力対象となる目的音の到来方向である
第１の入力方向を、前記目的音方向推定手段で推定され
た目的音方向に基づいて逐次修正する目的音方向修正手
段と、前記第２のビームフォーマにおいて入力対象とす
る雑音の到来方向である第２の入力方向を、前記雑音方
向推定手段で推定された雑音方向に基づいて逐次修正す
る雑音方向修正手段とを具備する。That is, voice input means for receiving the voice uttered by the speaker at at least two different positions,
Frequency analysis means for performing frequency analysis for each channel of the audio signal corresponding to the sound receiving position to output frequency components of a plurality of channels, and frequency components of the plurality of channels obtained by the frequency analysis means are outside the desired direction. The incoming noise suppressing process for suppressing the voice other than the voice from the speaker direction is performed by performing the adaptive filter process using the filter coefficient calculated so that the sensitivity of The speaker is obtained by performing adaptive filter processing on the frequency components of the plurality of channels obtained by the beamformer processing means and the frequency analysis means using filter coefficients calculated so that sensitivity outside the desired direction becomes low. A second beamformer processing means for suppressing voice from a direction and obtaining a noise component, and the first beamformer processing means. Noise direction estimation means for estimating the noise direction from the calculated filter coefficient, target sound direction estimation means for estimating the target sound direction from the filter coefficient calculated by the second beamformer processing means, and the first beam Target sound direction correcting means for sequentially correcting the first input direction, which is the arrival direction of the target sound to be input in the former, based on the target sound direction estimated by the target sound direction estimating means; The beam direction further includes a noise direction correction unit that sequentially corrects a second input direction, which is a direction of arrival of noise to be input, based on the noise direction estimated by the noise direction estimation unit.

【０１３０】そして、話者の発声した音声を異なる２箇
所以上の位置で音声入力手段は受音し、周波数分析手段
では、これを前記受音位置に対応する音声信号のチャネ
ル毎に周波数分析して複数チャネルの周波数成分を出力
する。そして、第１のビームフォーマ処理手段はこの周
波数分析手段にて得られる前記複数チャネルの周波数成
分について、所望方向外の感度が低くなるように計算し
たフィルタ係数を用いての適応フィルタ処理を施すこと
により前記話者方向からの音声以外の音声を抑圧する到
来雑音抑圧処理を行い、目的音声成分を得、また、第２
のビームフォーマ処理手段は、前記周波数分析手段にて
得られる前記複数チャネルの周波数成分について、所望
方向外の感度が低くなるように計算したフィルタ係数を
用いての適応フィルタ処理を施すことにより前記話者方
向からの音声を抑圧し、雑音成分を得る。そして、雑音
方向推定手段は、前記第１のビームフォーマ処理手段で
計算されるフィルタ係数から雑音方向を推定し、目的音
方向推定手段は、前記第２のビームフォーマ処理手段で
計算されるフィルタ係数から目的音方向を推定する。目
的音方向修正手段は、前記第１のビームフォーマにおい
て入力対象となる目的音の到来方向である第１の入力方
向を、前記目的音方向推定手段で推定された目的音方向
に基づいて逐次修正するので、第１のビームフォーマは
第１の入力方向以外から到来する雑音成分を抑圧して話
者の音声成分を低雑音で抽出することになる。また、雑
音方向修正手段は、前記第２のビームフォーマにおいて
入力対象とする雑音の到来方向である第２の入力方向
を、前記雑音方向推定手段で推定された雑音方向に基づ
いて逐次修正するので、第２のビームフォーマは第２の
入力方向以外から到来する成分を抑圧して話者の音声成
分を抑圧した残りの雑音成分を抽出することになる。Then, the voice input means receives the voice uttered by the speaker at two or more different positions, and the frequency analyzing means frequency-analyzes this for each channel of the voice signal corresponding to the sound receiving position. To output frequency components of multiple channels. Then, the first beamformer processing means performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. By the incoming noise suppression processing for suppressing voices other than the voice from the speaker direction to obtain a target voice component, and
The beamformer processing means of (1) performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means by using filter coefficients calculated so that sensitivity outside the desired direction becomes low. Suppresses the voice coming from the person and obtains a noise component. The noise direction estimating means estimates the noise direction from the filter coefficient calculated by the first beamformer processing means, and the target sound direction estimating means calculates the noise direction by the filter coefficient calculated by the second beamformer processing means. The target sound direction is estimated from. The target sound direction correcting means sequentially corrects the first input direction, which is the arrival direction of the target sound to be input in the first beam former, based on the target sound direction estimated by the target sound direction estimating means. Therefore, the first beam former suppresses noise components coming from other than the first input direction, and extracts the speaker's voice component with low noise. Further, the noise direction correction means sequentially corrects the second input direction, which is the arrival direction of noise to be input in the second beamformer, based on the noise direction estimated by the noise direction estimation means. , The second beam former suppresses the components coming from other than the second input direction and extracts the remaining noise components in which the speaker's voice component is suppressed.

【０１３１】このように本システムは雑音成分を抑圧し
た音声周波数成分と、音声成分を抑圧した雑音周波数成
分とを別々に得ることができるが、この発明の最大の特
徴は、第１及び第２のビームフォーマとして、周波数領
域で動作するビームフォーマを用いるようにした点にあ
る。そして、このことによって、計算量を大幅に削減す
ることができるようにしている。As described above, the present system can separately obtain the voice frequency component in which the noise component is suppressed and the noise frequency component in which the voice component is suppressed, but the greatest feature of the present invention is that the first and second The point is that a beamformer that operates in the frequency domain is used as the beamformer. And by doing so, the amount of calculation can be significantly reduced.

【０１３２】そしてこの発明によると、適応フィルタの
処理量が大幅に低減されるのに加え、入力音声に対する
周波数分析以外の周波数分析処理を省略することがで
き、かつ、フィルタ演算時に必要であった時間領域から
周波数領域ヘの変換処理も不要となり、全体の演算量を
大幅に削減することができる。According to the present invention, the processing amount of the adaptive filter is significantly reduced, and frequency analysis processing other than the frequency analysis on the input voice can be omitted, which is necessary for the filter calculation. The conversion processing from the time domain to the frequency domain is also unnecessary, and the overall calculation amount can be significantly reduced.

【０１３３】すなわち、従来技術では、ビームフォーマ
で抑圧できない拡散性雑音の抑圧処理のために、スペク
トルサブトラクション（以後、ＳＳと略称する）処理
を、ビームフォーマ処理の後に行うようにしており、こ
のＳＳは周波数スペクトルを入力とするため、ＦＦＴ
（高速フーリエ変換）などの周波数分析が従来必要であ
ったが、周波数領域で動作するビームフォーマを用いる
と当該ビームフォーマからは周波数スペクトルが出力さ
れるため、これをＳＳに流用できるので、特別にＳＳの
ためのＦＦＴを実施する従来のＦＦＴ処理工程は省略す
ることができる。故に、全体の演算量を大幅に削減する
ことができる。That is, in the prior art, the spectral subtraction (hereinafter abbreviated as SS) process is performed after the beamformer process in order to suppress the spreading noise that cannot be suppressed by the beamformer. Takes the frequency spectrum as input, so FFT
Frequency analysis such as (Fast Fourier Transform) has been conventionally required, but when a beamformer operating in the frequency domain is used, the frequency spectrum is output from the beamformer, and this can be diverted to SS. Conventional FFT processing steps that perform FFT for SS can be omitted. Therefore, the total calculation amount can be significantly reduced.

【０１３４】また、ビームフォーマのフィルタを用いた
方向推定の際に必要であった時間領域から周波数領域へ
の変換処理も不要となり、全体の演算量を大幅に削減す
ることができる。Further, the conversion processing from the time domain to the frequency domain, which is necessary for the direction estimation using the filter of the beamformer, becomes unnecessary, and the total calculation amount can be greatly reduced.

【０１３５】次に、雑音源が目的音方向の範囲を横切っ
て移動した場合にも追尾が高精度で行えるようにした例
を説明する。Next, an example will be described in which tracking can be performed with high accuracy even when the noise source moves across the range of the target sound direction.

【０１３６】＜話者追尾マイクロホンアレイ１０の構成
例２＞話者追尾マイクロホンアレイ１０の別の構成例に
ついて説明する。<Structure Example 2 of Speaker Tracking Microphone Array 10> Another structure example of the speaker tracking microphone array 10 will be described.

【０１３７】本例では、雑音源が目的音方向の範囲を横
切って移動した場合にも追尾が高精度で行えるように、
雑音を追尾するビームフォーマを２つ用いる場合の例に
ついて説明する。In this example, tracking can be performed with high accuracy even when the noise source moves across the range of the target sound direction.
An example of using two beam formers that track noise will be described.

【０１３８】話者追尾マイクロホンアレイ１０の構成例
２としての全体構成図を図７に示す。図７において、１
１は音声入力部、１２は周波数解析部、１３は第１のビ
ームフォーマ、１４は第１の入力方向修正部、１５は第
２の入力方向修正部、１６は第２のビームフォーマ、１
７は雑音方向推定部、１８は第１の音声方向推定部（目
的音方向推定部）、そして、２１は第３の入力方向修正
部、２２は第３のビームフォーマ、２３は第２の音声方
向推定部、２４は有効雑音決定部である。FIG. 7 shows an overall configuration diagram as a configuration example 2 of the speaker tracking microphone array 10. In FIG. 7, 1
1 is a voice input unit, 12 is a frequency analysis unit, 13 is a first beamformer, 14 is a first input direction correction unit, 15 is a second input direction correction unit, 16 is a second beamformer, 1
7 is a noise direction estimation unit, 18 is a first voice direction estimation unit (target sound direction estimation unit), 21 is a third input direction correction unit, 22 is a third beam former, and 23 is a second voice. The direction estimating unit, 24 is an effective noise determining unit.

【０１３９】これらのうち、第３の入力方向修正部２１
は、第３のビームフォーマ２２の入力方向を雑音方向に
修正するためのものであって、第３のビームフォーマ２
２において、入力対象とする雑音の到来方向である第３
の入力方向を、前記雑音方向推定部１７で推定された雑
音方向に基づいて逐次方向修正するための出力を発生
し、第３のビームフォーマ２２に与えるものである。具
体的には、第３の入力方向修正部２１は、雑音方向推定
部１７の出力する推定量対応のデータを現在の目的とす
る雑音源方向の角度情報に変換して目標角度情報αとし
て第３のビームフォーマ２２に出力するものである。Of these, the third input direction correction unit 21
Is for correcting the input direction of the third beam former 22 to the noise direction, and
2 is the arrival direction of noise to be input, the third
The output direction for sequentially correcting the input direction of (1) based on the noise direction estimated by the noise direction estimating unit 17 is generated and given to the third beam former 22. Specifically, the third input direction correction unit 21 converts the data corresponding to the estimated amount output from the noise direction estimation unit 17 into the current target angle information of the noise source direction, and sets it as the target angle information α. 3 to the beam former 22.

【０１４０】第３のビームフォーマ２２は、周波数分析
部１２からの複数チャネルの周波数成分出力、この場
合、１ｃｈ，２ｃｈの音声信号の周波数スペクトルを用
いて、これより雑音源方向からの周波数スペクトル成分
を抽出するためのものであって、前記１ｃｈ，２ｃｈそ
れぞれの周波数成分（周波数スペクトルデータ）に対し
て方向別感度調整を施した適応フィルタ処理により雑音
音源方向以外の周波数スペクトル成分の抑圧処理を行う
ことで、雑音音源方向からの周波数スペクトル成分のデ
ータを抽出するといったことを行う処理手段である。こ
の第３のビームフォーマ２２も第１及び第２のビームフ
ォーマ１３，１６同様、図４で説明した如きの構成を採
用している。The third beam former 22 uses the frequency component outputs of a plurality of channels from the frequency analysis unit 12, in this case, the frequency spectra of the audio signals of 1ch and 2ch, and from this, the frequency spectrum components from the noise source direction. Is performed to suppress the frequency spectrum components other than the noise source direction by the adaptive filter processing in which the sensitivity adjustment for each direction is performed on the frequency components (frequency spectrum data) of each of the 1ch and 2ch. Thus, it is a processing means for extracting data of the frequency spectrum component from the noise source direction. Like the first and second beam formers 13 and 16, the third beam former 22 also employs the configuration as described with reference to FIG.

【０１４１】第２の音声方向推定部２３は、目的音声推
定部（音声方向推定部）１８と同様のものであって、前
記第３のビームフォーマ２２で計算されるフィルタ係数
から目的音方向を推定すると云った処理を行うものであ
り、具体的には前記第３のビームフォーマ２２の適応フ
ィルタから音声方向を推定し、その推定量対応のデータ
を出力するものである。The second voice direction estimating unit 23 is similar to the target voice estimating unit (voice direction estimating unit) 18, and determines the target sound direction from the filter coefficient calculated by the third beam former 22. This is a process for estimating, and specifically, the voice direction is estimated from the adaptive filter of the third beamformer 22 and the data corresponding to the estimated amount is output.

【０１４２】有効雑音決定部２４は、音声方向推定部１
８，２３および雑音方向推定部１７の推定する音声方向
および雑音方向の情報に基づき、第２のビームフォーマ
１６と第３のビームフォーマ２２のいずれが雑音を有効
に追尾しているかを判断し、有効に追尾していると判断
した方のビームフォーマの出力を、雑音成分として出力
するものである。なお、その他、図３の構成と同一符号
を付したものは同一物を示しているので、詳細は先の説
明を参照することとし、ここでは改めて説明はしない。The effective noise determining section 24 is the voice direction estimating section 1
8, 23 and the information on the voice direction and the noise direction estimated by the noise direction estimation unit 17, it is determined which of the second beamformer 16 and the third beamformer 22 is effectively tracking noise, The output of the beam former determined to be effectively tracking is output as a noise component. In addition, since the components denoted by the same reference numerals as those of the configuration of FIG. 3 indicate the same components, the details will be referred to the above description, and will not be described again here.

【０１４３】図からわかるように当該構成例２の話者追
尾マイクロホンアレイ１０と構成例１の話者追尾マイク
ロホンアレイ１０との違いは、構成例１に対し、更に第
３の入力方向修正部２１と、第３のビームフォーマ２２
と、第２の音声方向推定部２３、および有効雑音決定部
２４を追加した点である。As can be seen from the figure, the difference between the speaker tracking microphone array 10 of the configuration example 2 and the speaker tracking microphone array 10 of the configuration example 1 is that in the configuration example 1, the third input direction correction unit 21 is further provided. And the third beam former 22
And a second voice direction estimating unit 23 and an effective noise determining unit 24 are added.

【０１４４】そして、第２及び第３のビームフォーマ１
６，２２の出力、及び、雑音方向推定部１７の出力、及
び、第１及び第２の音声方向推定部１８，２３の出力を
有効雑音決定部２４に渡し、有効雑音決定部２４の出力
を第１の入力方向修正部１４に渡す構成としてある。Then, the second and third beam formers 1
6, 22 and the output of the noise direction estimation unit 17, and the outputs of the first and second voice direction estimation units 18, 23 are passed to the effective noise determination unit 24, and the output of the effective noise determination unit 24 is transferred. It is configured to be passed to the first input direction correction unit 14.

【０１４５】このような構成の本システムの作用を説明
する。The operation of the present system having such a configuration will be described.

【０１４６】まず、複数のマイクロホンを持つ音声入力
部１１、この例では第１及び第２の計２本のマイクロホ
ンｍ１，ｍ２を持つ音声入力部１１でｃｈ１，ｃｈ２の
音声を取り込む。そして、この音声入力部１１から入力
された２チャネル分の音声の信号ｃｈ１，ｃｈ２（すな
わち、第１チャネルｃｈ１は第１のマイクロホンｍ１か
らの音声、第２チャネルｃｈ２は第２のマイクロホンｍ
２からの音声に該当する）は、周波数分析部１２に送ら
れ、ここで例えば高速フーリエ変換（ＦＦＴ）等の処理
を行うことによって、それぞれのチャネル別に周波数成
分（周波数スペクトル）が求められる。First, the voice input unit 11 having a plurality of microphones, in this example, the voice input unit 11 having a total of two microphones m1 and m2, captures the voices of ch1 and ch2. The two channels of audio signals ch1 and ch2 input from the audio input unit 11 (that is, the first channel ch1 is the audio from the first microphone m1 and the second channel ch2 is the second microphone m).
(Corresponding to the voice from 2) is sent to the frequency analysis unit 12, where a frequency component (frequency spectrum) is obtained for each channel by performing processing such as fast Fourier transform (FFT).

【０１４７】周波数分析部１２でそれぞれ求められたチ
ャネル別の周波数成分は、それぞれ第１、第２及び第３
のビームフォーマ１３，１６，２２に与えられる。The frequency components for each channel obtained by the frequency analysis unit 12 are respectively the first, second and third frequency components.
Beam formers 13, 16 and 22.

【０１４８】第１のビームフォーマ１３では、２チャネ
ル分の周波数成分入力について、目的音の方向対応に位
相を合わせた上で、周波数領域の適応フィルタにより上
述のようにして処理することで雑音を抑圧し、目的音の
方向の周波数成分を出力する。ここで、具体的に説明す
ると第１の入力方向修正部１４は第１のビームフォーマ
１３に対して次のような角度情報（α）を与える。つま
り、第１の入力方向修正部１４は、有効雑音決定部２４
を介して与えられる音声方向推定部１８若しくは音声方
向推定部２３からの出力を用い、目的音の方向があたか
もマイクロホンの正面方向となるよう、上記２チャネル
の周波数成分の入力位相を整えるに必要な角度情報
（α）を入力方向修正量として第１のビームフォーマ１
３に対して与える。In the first beam former 13, the frequency components of the two channels are matched in phase according to the direction of the target sound, and processed by the adaptive filter in the frequency domain as described above to reduce noise. Suppress and output the frequency component in the direction of the target sound. Here, specifically describing, the first input direction correction unit 14 gives the following angle information (α) to the first beam former 13. That is, the first input direction correction unit 14 includes the effective noise determination unit 24.
It is necessary to adjust the input phase of the frequency components of the two channels by using the output from the voice direction estimation unit 18 or the voice direction estimation unit 23 given via the so that the direction of the target sound is as if it were the front direction of the microphone. The first beam former 1 with the angle information (α) as the input direction correction amount
Give to 3.

【０１４９】この結果、第１のビームフォーマ１３はこ
の修正量（α）対応に目的音方向を修正し、当該目的音
方向以外の方向から到来する音声を抑圧させるようにす
ることで、雑音成分を抑圧し、目的音を抽出する。As a result, the first beam former 13 corrects the target sound direction in accordance with the correction amount (α), and suppresses the voice coming from the direction other than the target sound direction, whereby the noise component Is suppressed and the target sound is extracted.

【０１５０】つまり、第２および第３のビームフォーマ
１６，２２の場合、雑音が目的音であるから、雑音に位
相を合わせている。その結果、第２，第３のビームフォ
ーマ１６，２２では話者の音源は雑音源として扱われ、
各ビームフォーマの内蔵する適応フィルタは話者音源か
らの音を抽出する処理をすることになるので、当該第
２，第３のビームフォーマ１６，２２の適応フィルタの
パラメータからは話者音源の方向を反映した情報が得ら
れることになる。That is, in the case of the second and third beam formers 16 and 22, since the noise is the target sound, the phase is matched with the noise. As a result, the sound source of the speaker is treated as a noise source in the second and third beamformers 16 and 22,
Since the adaptive filter built in each beamformer performs the process of extracting the sound from the speaker sound source, the direction of the speaker sound source can be determined from the parameters of the adaptive filters of the second and third beamformers 16 and 22. Information that reflects is obtained.

【０１５１】従って、第１または第２の音声方向推定部
１８または２３により、第２または第３のビームフォー
マ１６または２２における適応フィルタのパラメータを
用いて雑音源方向を知れば、それは目的音である話者音
源の方向を反映させたものである。従って、第１または
第２の音声方向推定部１８または２３により、第２また
は第３のビームフォーマ１６または２２における適応フ
ィルタのパラメータを反映させた出力を出し、第１の入
力方向修正部１４でこの出力対応に入力方向修正量
（α）を発生し、この修正量対応に第１のビームフォー
マ１３における目的音方向を修正すれば、第１のビーム
フォーマ１３は当該目的音方向以外の方向から到来する
音声を抑圧するので、この場合、話者音源からの成分を
抽出できることになる。Therefore, if the noise source direction is known by the first or second speech direction estimation unit 18 or 23 using the parameter of the adaptive filter in the second or third beam former 16 or 22, it is the target sound. It reflects the direction of a speaker's sound source. Therefore, the first or second voice direction estimation unit 18 or 23 outputs an output reflecting the parameter of the adaptive filter in the second or third beam former 16 or 22, and the first input direction correction unit 14 outputs the output. If an input direction correction amount (α) is generated in response to this output and the target sound direction in the first beam former 13 is corrected in accordance with this correction amount, the first beam former 13 will move from a direction other than the target sound direction. Since the incoming voice is suppressed, in this case, the component from the speaker sound source can be extracted.

【０１５２】一方、第１のビームフォーマ１３の適応フ
ィルタでは雑音成分が抽出されるようにパラメータが制
御されているので、このパラメータから雑音方向推定部
１７では、雑音方向を推定し、その情報を第２及び第３
の入力方向修正部１５，２１と有効雑音決定部２４に与
えることになる。On the other hand, since the parameter is controlled by the adaptive filter of the first beam former 13 so that the noise component is extracted, the noise direction estimation unit 17 estimates the noise direction from this parameter and outputs the information. Second and third
To the effective noise determining unit 24 and the input direction correcting units 15 and 21.

【０１５３】そして、当該雑音方向推定部１７からの出
力を受けた第２の入力方向修正部１５では、当該雑音方
向推定部１７からの出力対応に入力方向修正量（α）を
発生し、この修正量対応に第２のビームフォーマ１６に
おける目的音方向を修正すれば、第２のビームフォーマ
１６は当該目的音方向以外の方向から到来する音声を抑
圧するので、この場合、話者音源以外からの成分である
雑音成分を抽出できることになる。Then, the second input direction correction unit 15 receiving the output from the noise direction estimation unit 17 generates the input direction correction amount (α) corresponding to the output from the noise direction estimation unit 17, and If the target sound direction in the second beamformer 16 is corrected according to the correction amount, the second beamformer 16 suppresses the voice coming from directions other than the target sound direction. Therefore, the noise component that is the component of can be extracted.

【０１５４】このとき、第２のビームフォーマ１６の適
応フィルタでは目的音である話者音声成分が抽出される
ようにパラメータが制御されているので、このパラメー
タから第１の音声方向推定部１８では、話者音声方向を
推定することができる。そして、第１の音声方向推定部
１８はその推定した情報を有効雑音決定部２４に与え
る。At this time, since the parameters are controlled by the adaptive filter of the second beam former 16 so as to extract the speaker voice component which is the target sound, the first voice direction estimation unit 18 uses this parameter from the parameters. , The speaker voice direction can be estimated. Then, the first voice direction estimating unit 18 gives the estimated information to the effective noise determining unit 24.

【０１５５】また、雑音方向推定部１７からの出力が第
３の入力方向修正部２１にも与えられているが、これを
受けた第３の入力方向修正部２１では、当該雑音方向推
定部１７からの出力対応に入力方向修正量（α）を発生
に、第３のビームフォーマ２２に与える。これにより、
第３のビームフォーマ２２はこの与えられた修正量対応
に、自己における目的音方向を修正する。The output from the noise direction estimation unit 17 is also given to the third input direction correction unit 21, and the third input direction correction unit 21 having received this output, the noise direction estimation unit 17 concerned. The input direction correction amount (α) is generated in response to the output from and is given to the third beam former 22. This allows
The third beam former 22 corrects its own target sound direction in accordance with the given correction amount.

【０１５６】これにより、第３のビームフォーマ２２は
当該目的音方向以外の方向から到来する音声を抑圧する
ので、この場合、話者音源以外からの成分、つまり、雑
音成分を抽出できることになる。As a result, the third beam former 22 suppresses a voice coming from a direction other than the target sound direction, and in this case, a component other than the speaker sound source, that is, a noise component can be extracted.

【０１５７】このとき、第３のビームフォーマ２２の適
応フィルタでは目的音である話者音声成分が抽出される
ようにパラメータが制御されているので、このパラメー
タから第２の音声方向推定部２３では、話者音声方向を
推定できる。そして、この推定した情報は有効雑音決定
部２４に与えることになる。At this time, since the parameters are controlled by the adaptive filter of the third beam former 22 so that the speaker voice component which is the target sound is extracted, the second voice direction estimation unit 23 uses this parameter from the parameters. , The speaker voice direction can be estimated. Then, this estimated information is given to the effective noise determination unit 24.

【０１５８】有効雑音決定部２４では、第１および第２
の音声方向推定部１８，２３から与えられた話者音声方
向の推定情報と、雑音方向推定部１７から与えられた雑
音方向の推定情報とをもとに、第２のビームフォーマ１
６と第３のビームフォーマ２２のいずれが雑音を有効に
追尾しているかを判断する。そして、この判断結果に基
づき、有効に追尾していると判断した方のビームフォー
マにおける適応フィルタのパラメータを第１の入力方向
修正部１４に与える。In the effective noise determining section 24, the first and second
Of the second beamformer 1 based on the speaker voice direction estimation information given from the voice direction estimation units 18 and 23 and the noise direction estimation information given from the noise direction estimation unit 17.
It is determined which of the sixth beamformer 22 and the third beamformer 22 is effectively tracking noise. Then, based on the result of this determination, the parameter of the adaptive filter in the beamformer that is determined to be effectively tracking is given to the first input direction correction unit 14.

【０１５９】そのため、第１の入力方向修正部１４で
は、当該パラメータを反映させた出力を出し、第１の入
力方向修正部１４でこの出力対応に入力方向修正量
（α）を発生し、この修正量対応に第１のビームフォー
マ１３における目的音方向を修正するので、第１のビー
ムフォーマ１３は当該目的音方向以外の方向から到来す
る音声を抑圧することになって、この場合、話者音源か
らの成分を抽出でき、しかも、広く移動する雑音源から
の雑音を対象とする場合に、その移動する雑音源を見失
うことなく、確実にとらえて雑音除去することが可能と
なる。Therefore, the first input direction correction unit 14 outputs an output reflecting the parameter, and the first input direction correction unit 14 generates the input direction correction amount (α) corresponding to this output. Since the target sound direction in the first beam former 13 is corrected according to the correction amount, the first beam former 13 is supposed to suppress the voice coming from a direction other than the target sound direction. In this case, the speaker It is possible to extract a component from a sound source, and when noise from a widely moving noise source is targeted, it is possible to reliably capture and remove the noise without losing the moving noise source.

【０１６０】すなわち、この実施例においては、話者の
音声周波数成分の抽出用として第１のビームフォーマ１
３が設けてあり、また、雑音周波数成分の抽出用として
第２および第３のビームフォーマ１６，２２が設けてあ
る。そして、観測点から見て図８に示すように、話者が
０°方向に位置していて０°±θの角度範囲で監視すれ
ば良いとすると、当該話者の音声周波数成分を抽出する
ために設けた第１のビームフォーマ１３の変化範囲φ
１、すなわち、適応フィルタにおける感度を高くする方
向についての１°刻み変化範囲はせいぜい −θ ＜ φ１＜ θ に設定してこの範囲でフィルタリングに用いることにな
る。この場合、雑音周波数成分を抽出するために設けた
第２および第３のビームフォーマ１６，２２のうち、第
２のビームフォーマ１６の変化範囲φ２は −180゜＋θ ＜ φ２＜ −θ そして、第３のビームフォーマ２２の変化範囲φ３は θ ＜ φ３＜ 180゜−θ に設定することになる。但し、１８０°は中心点を介し
て０°の対向位置、−は０°位置から見て図における反
時計方向回り、＋は時計方向回りを示す。That is, in this embodiment, the first beam former 1 is used for extracting the voice frequency component of the speaker.
3 is provided, and second and third beam formers 16 and 22 are provided for extracting noise frequency components. Then, as shown in FIG. 8 when viewed from the observation point, if the speaker is located in the 0 ° direction and should be monitored in the angle range of 0 ° ± θ, the voice frequency component of the speaker is extracted. Change range φ of the first beam former 13 provided for
1, that is, the change range in steps of 1 ° in the direction of increasing the sensitivity in the adaptive filter is set to −θ <φ1 <θ at most, and this range is used for filtering. In this case, of the second and third beam formers 16 and 22 provided to extract the noise frequency component, the change range φ2 of the second beam former 16 is −180 ° + θ <φ2 <−θ, and The change range φ3 of the beam former 22 of No. 3 is set to θ <φ3 <180 ° −θ. However, 180 ° indicates a facing position of 0 ° through the center point, − indicates counterclockwise rotation in the figure when viewed from the 0 ° position, and + indicates clockwise rotation.

【０１６１】故に、このようにすると、第２のビームフ
ォーマ１６と第３のビームフォーマ２２は、目的音到来
範囲φ１を挟んで各々別々の範囲から到来する雑音を追
尾することになる。そのため、φ２の範囲にあった雑音
源がφ１の範囲を横切ってφ３の範囲に急に移動した場
合でも、φ３の領域を持ち場とする第３のビームフォー
マ２２が当該移動して来た雑音源を直ちに捕えることが
できるため、雑音方向を見失うことがなくなる。Therefore, in this way, the second beam former 16 and the third beam former 22 track noises coming from different ranges with the target sound arrival range φ1 in between. Therefore, even if the noise source in the range of φ2 suddenly moves to the range of φ3 across the range of φ1, the third beamformer 22 having the region of φ3 as its place moves the noise source. Can be immediately captured, so that the direction of noise is never lost.

【０１６２】この構成の場合、第２のビームフォーマ１
６の出力と、第３のビームフォーマビーム２２の出力の
計２つの出力が、雑音の出力として得られるが、雑音方
向推定部１７の結果に基づき、有効雑音決定部２４にお
いて、第２のビームフォーマ１６と第３のビームフォー
マ２２のいずれが雑音を有効に追尾しているかを判断
し、この判断結果に基づき、有効に追尾して方の出力を
雑音成分として用いることになる。In the case of this configuration, the second beam former 1
A total of two outputs, that is, the output of No. 6 and the output of the third beamformer beam 22 are obtained as the output of noise. Based on the result of the noise direction estimation unit 17, the effective noise determination unit 24 outputs the second beam. It is determined which of the former 16 and the third beam former 22 is effectively tracking the noise, and based on this determination result, the output of the one which is effectively tracked is used as the noise component.

【０１６３】［構成例２の話者追尾マイクロホンアレイ
１０における全体の処理の流れ］以上の処理の全体の流
れを図９に示しておく。この処理はフレーム毎に行われ
る。各ビームフォーマの変化範囲および入力方向の初期
値を設定した後に（ステップＳ３１）、第１のビームフ
ォーマ１３の処理を行い（ステップＳ３２）、雑音方向
を推定した後に（ステップＳ３３）、該雑音方向を入力
として有効雑音決定部２４において、雑音方向がφ２に
あるか、φ３にあるかの判定を実施し、第２のビームフ
ォーマ１６と第３のビームフォーマ２２のどちらを選択
するかを決定する（ステップＳ３４）。[Overall Processing Flow in Speaker Tracking Microphone Array 10 of Configuration Example 2] FIG. 9 shows the overall processing flow described above. This process is performed for each frame. After setting the change range and the initial value of the input direction of each beamformer (step S31), the process of the first beamformer 13 is performed (step S32), and after estimating the noise direction (step S33), the noise direction As an input, the effective noise determination unit 24 determines whether the noise direction is φ2 or φ3, and determines which of the second beamformer 16 and the third beamformer 22 to select. (Step S34).

【０１６４】そして、推定された雑音方向が第２の入力
方向修正部１５あるいは第３の入力方向修正部２１のど
ちらかに送られ、雑音方向が修正され、選択されたビー
ムフォーマの処理が実行される。Then, the estimated noise direction is sent to either the second input direction modification unit 15 or the third input direction modification unit 21, the noise direction is modified, and the processing of the selected beam former is executed. To be done.

【０１６５】すなわち、推定された雑音方向がφ２の領
域であれば雑音方向が第２の入力方向修正部１５に送ら
れ、雑音方向が修正され、第２のビームフォーマ１６の
処理が実行され、目的音方向が推定される（ステップＳ
３４，Ｓ３５，Ｓ３６，Ｓ３７）。That is, if the estimated noise direction is in the region of φ2, the noise direction is sent to the second input direction correction section 15, the noise direction is corrected, and the processing of the second beam former 16 is executed. The target sound direction is estimated (step S
34, S35, S36, S37).

【０１６６】また、推定された雑音方向がφ３の領域で
あれば雑音方向が第３の入力方向修正部２１に送られ、
雑音方向が修正され、第３のビームフォーマ２２の処理
が実行され、目的音方向が推定される（ステップＳ３
４，Ｓ３８，Ｓ３９，Ｓ４０，Ｓ４１）。If the estimated noise direction is in the area of φ3, the noise direction is sent to the third input direction correction unit 21,
The noise direction is corrected, the processing of the third beam former 22 is executed, and the target sound direction is estimated (step S3).
4, S38, S39, S40, S41).

【０１６７】次に、選択されたビームフォーマにより推
定された音声方向（目的音方向）がφ１の範囲内かどう
か判断され、範囲内の場合は、推定された音声方向が第
１のビームフォーマ１３の第１の入力方向修正部１４に
送られ、入力方向の修正が実行される（ステップＳ４
２，Ｓ４３）。範囲外の場合は修正処理が実行されず、
次のフレームに対する処理に進む（ステップＳ４２，Ｓ
３１）。Next, it is judged whether or not the voice direction (target sound direction) estimated by the selected beamformer is within the range of φ1, and if it is within the range, the estimated voice direction is determined by the first beamformer 13. Is sent to the first input direction correction unit 14 and the input direction is corrected (step S4).
2, S43). If it is out of the range, the correction process is not executed,
The process proceeds to the next frame (steps S42, S
31).

【０１６８】この処理がフレーム毎に行われ、音声およ
び雑音方向を追尾しながら、雑音抑圧が行われる。This processing is performed for each frame, and noise suppression is performed while tracking the voice and noise directions.

【０１６９】このように、この例では、話者の発声した
音声を少なくとも異なる２箇所以上の位置で受音する音
声入力手段と、前記受音位置に対応する音声信号のチャ
ネル毎に周波数分析を行って複数チャネルの周波数成分
を出力する周波数分析手段と、この周波数分析手段にて
得られる前記複数チャネルの周波数成分について、所望
方向外の感度が低くなるように計算したフィルタ係数を
用いての適応フィルタ処理を施すことにより前記話者方
向からの音声以外の音声を抑圧する到来雑音抑圧処理を
行い、目的音声成分を得る第１のビームフォーマ処理手
段と、前記周波数分析手段にて得られる前記複数チャネ
ルの周波数成分について、所望方向外の感度が低くなる
ように計算したフィルタ係数を用いての適応フィルタ処
理を施すことにより前記話者方向からの音声を抑圧し、
第１の雑音成分を得る第２のビームフォーマ処理手段
と、前記周波数分析手段にて得られる前記複数チャネル
の周波数成分について、所望方向外の感度が低くなるよ
うに計算したフィルタ係数を用いての適応フィルタ処理
を施すことにより前記話者方向からの音声を抑圧し、第
２の雑音成分を得る第２のビームフォーマ処理手段と、
前記第１のビームフォーマ処理手段で計算されるフィル
タ係数から雑音方向を推定する雑音方向推定手段と、前
記第２のビームフォーマ処理手段で計算されるフィルタ
係数から第１の目的音方向を推定する第１の目的音方向
推定手段と、前記第３の適応ビームフォーマ処理手段で
計算されるフィルタ係数から第２の目的音方向を推定す
る第２の目的音方向推定手段と、前記第１のビームフォ
ーマにおいて入力対象とする目的音の到来方向である第
１の入力方向を、前記第１の目的音方向推定手段で推定
された第１の目的音方向と、第２の目的音方向推定手段
で推定された第２の目的音方向のいずれか一方または両
方に基づいて逐次修正する第１の入力方向修正手段と、
前記雑音方向修正手段で推定された雑音方向が所定の第
１の範囲にある場合に、前記第２のビームフォーマにお
いて入力対象とする雑音の到来方向である第２の入力方
向を該雑音方向に基づいて逐次修正する第２の入力方向
修正手段と、前記雑音方向修正手段で推定された雑音方
向が所定の第２の範囲にある場合に、前記第３のビーム
フォーマにおいて入力対象とする雑音の到来方向である
第３の入力方向を該雑音方向に基づいて逐次修正する第
３の入力方向修正手段と、前記雑音方向推定手段で推定
された雑音方向が所定の第１の範囲から到来したか所定
の第２の範囲から到来したかに基づいて前記第１の出力
雑音と前記第２の出力雑音のいずれか一方を真の雑音出
力と決定していずれか一方の雑音を出力すると同時に、
第１の音声方向推定手段と第２の音声方向推定手段のい
ずれの推定結果が有効であるかを決定していずれか一方
の音声方向推定結果を第１の入力方向修正手段へ出力す
る有効雑音決定手段とを具備して構成したものである。As described above, in this example, the voice inputting means for receiving the voice uttered by the speaker at at least two different positions and the frequency analysis for each channel of the voice signal corresponding to the sound receiving position. Frequency analysis means for performing frequency output of a plurality of channels and adaptation of the frequency components of the plurality of channels obtained by the frequency analysis means using a filter coefficient calculated so that sensitivity outside the desired direction becomes low. A first beamformer processing unit that obtains a target voice component by performing an incoming noise suppression process that suppresses voices other than the voice from the speaker direction by performing a filter process, and the plurality of units obtained by the frequency analysis unit. By applying adaptive filter processing to the frequency components of the channel, using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. Suppressing the voice from the speaker direction,
The second beamformer processing means for obtaining the first noise component and the filter coefficient calculated for the frequency components of the plurality of channels obtained by the frequency analysis means so that the sensitivity outside the desired direction becomes low. Second beamformer processing means for suppressing the voice from the speaker direction by performing adaptive filter processing and obtaining a second noise component;
A noise direction estimating means for estimating a noise direction from the filter coefficient calculated by the first beamformer processing means, and a first target sound direction is estimated from a filter coefficient calculated by the second beamformer processing means. First target sound direction estimating means, second target sound direction estimating means for estimating a second target sound direction from the filter coefficient calculated by the third adaptive beam former processing means, and the first beam The first input direction, which is the arrival direction of the target sound to be input in the former, is defined by the first target sound direction estimated by the first target sound direction estimating means and the second target sound direction estimating means. First input direction correction means for sequentially correcting based on either or both of the estimated second target sound directions,
When the noise direction estimated by the noise direction correction means is within a predetermined first range, the second input direction, which is the direction of arrival of noise to be input in the second beamformer, is set to the noise direction. Second input direction correcting means for sequentially correcting the noise on the basis of the noise and the noise direction estimated by the noise direction correcting means is within a predetermined second range. Third input direction correcting means for sequentially correcting a third input direction which is an arrival direction based on the noise direction, and whether the noise direction estimated by the noise direction estimating means has come from a predetermined first range. Based on whether it comes from a predetermined second range, one of the first output noise and the second output noise is determined to be a true noise output, and either one of the noises is output, and at the same time,
Effective noise for determining which estimation result of the first voice direction estimation means and the second voice direction estimation means is valid and outputting either one of the voice direction estimation results to the first input direction correction means. And a determining means.

【０１７０】そして、このような構成の場合、話者の発
声した音声を異なる２箇所以上の位置で音声入力手段は
受音し、周波数分析手段では、これを前記受音位置に対
応する音声信号のチャネル毎に周波数分析して複数チャ
ネルの周波数成分を出力する。そして、第１のビームフ
ォーマ処理手段はこの周波数分析手段にて得られる前記
複数チャネルの周波数成分について、所望方向外の感度
が低くなるように計算したフィルタ係数を用いての適応
フィルタ処理を施すことにより前記話者方向からの音声
以外の音声を抑圧する到来雑音抑圧処理を行い、目的音
声成分を得、また、第２のビームフォーマ処理手段は、
前記周波数分析手段にて得られる前記複数チャネルの周
波数成分について、所望方向外の感度が低くなるように
計算したフィルタ係数を用いての適応フィルタ処理を施
すことにより前記話者方向からの音声を抑圧し、雑音成
分を得る。そして、雑音方向推定手段は、前記第１のビ
ームフォーマ処理手段で計算されるフィルタ係数から雑
音方向を推定し、目的音方向推定手段は、前記第２のビ
ームフォーマ処理手段で計算されるフィルタ係数から目
的音方向を推定する。また、第１の目的音方向推定手段
は前記第２のビームフォーマ処理手段で計算されるフィ
ルタ係数から第１の目的音方向を推定し、第２の目的音
方向推定手段は、前記第３の適応ビームフォーマ処理手
段で計算されるフィルタ係数から第２の目的音方向を推
定する。In such a configuration, the voice input means receives the voice uttered by the speaker at two or more different positions, and the frequency analysis means receives the voice signal corresponding to the sound receiving position. The frequency analysis is performed for each channel and the frequency components of a plurality of channels are output. Then, the first beamformer processing means performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. The incoming beam noise suppressing process for suppressing a voice other than the voice from the speaker direction is performed to obtain a target voice component, and the second beamformer processing means
The frequency components of the plurality of channels obtained by the frequency analysis means are subjected to adaptive filter processing using filter coefficients calculated so that sensitivity outside the desired direction becomes low, thereby suppressing voice from the speaker direction. Then, the noise component is obtained. The noise direction estimating means estimates the noise direction from the filter coefficient calculated by the first beamformer processing means, and the target sound direction estimating means calculates the noise direction by the filter coefficient calculated by the second beamformer processing means. The target sound direction is estimated from. The first target sound direction estimating means estimates the first target sound direction from the filter coefficient calculated by the second beamformer processing means, and the second target sound direction estimating means calculates the third target sound direction. The second target sound direction is estimated from the filter coefficient calculated by the adaptive beam former processing means.

【０１７１】また、第１の入力方向修正手段は、前記第
１のビームフォーマにおいて入力対象とする目的音の到
来方向である第１の入力方向を、前記第１の目的音方向
推定手段で推定された第１の目的音方向と、第２の目的
音方向推定手段で推定された第２の目的音方向のいずれ
か一方または両方に基づいて逐次修正する。そして、第
２の入力方向修正手段は、前記雑音方向修正手段で推定
された雑音方向が所定の第１の範囲にある場合に、前記
第２のビームフォーマにおいて入力対象とする雑音の到
来方向である第２の入力方向を該雑音方向に基づいて逐
次修正し、第３の入力方向修正手段は、前記雑音方向修
正手段で推定された雑音方向が所定の第２の範囲にある
場合に、前記第３のビームフォーマにおいて入力対象と
する雑音の到来方向である第３の入力方向を該雑音方向
に基づいて逐次修正する。The first input direction correcting means estimates the first input direction, which is the arrival direction of the target sound to be input in the first beam former, by the first target sound direction estimating means. Sequential correction is performed based on either or both of the first target sound direction thus determined and the second target sound direction estimated by the second target sound direction estimating means. When the noise direction estimated by the noise direction correcting unit is within a predetermined first range, the second input direction correcting unit determines the arrival direction of noise to be input in the second beamformer. A certain second input direction is sequentially corrected on the basis of the noise direction, and the third input direction correction means, when the noise direction estimated by the noise direction correction means is within a predetermined second range, The third input direction, which is the arrival direction of noise to be input in the third beam former, is sequentially corrected based on the noise direction.

【０１７２】従って、第２の入力方向修正手段の出力に
より第２の入力方向を修正される第２のビームフォーマ
は第２の入力方向以外から到来する成分を抑圧して残り
の雑音成分を抽出することになり、また、第３の入力方
向修正手段の出力により第３の入力方向を修正される第
３のビームフォーマは第３の入力方向以外から到来する
成分を抑圧して残りの雑音成分を抽出することになる。Therefore, the second beam former whose second input direction is corrected by the output of the second input direction correcting means suppresses components coming from other than the second input direction and extracts the remaining noise components. In addition, the third beam former whose third input direction is corrected by the output of the third input direction correcting means suppresses the components arriving from other than the third input direction and remaining noise components. Will be extracted.

【０１７３】そして、有効雑音決定手段は、前記雑音方
向推定手段で推定された雑音方向が所定の第１の範囲か
ら到来したか所定の第２の範囲から到来したかに基づい
て前記第１の出力雑音と前記第２の出力雑音のいずれか
一方を真の雑音出力と決定していずれか一方の雑音を出
力すると同時に、第１の音声方向推定手段と第２の音声
方向推定手段のいずれの推定結果が有効であるかを決定
して有効な方の音声方向推定結果を第１の入力方向修正
手段へ出力する。Then, the effective noise determining means determines whether the noise direction estimated by the noise direction estimating means comes from the predetermined first range or the predetermined second range. Either one of the output noise and the second output noise is determined to be a true noise output, and either one of the noises is output, and at the same time, one of the first voice direction estimating means and the second voice direction estimating means is output. It is determined whether the estimation result is valid and the valid voice direction estimation result is output to the first input direction correcting means.

【０１７４】この結果、目的音方向修正手段は、前記第
１のビームフォーマにおいて入力対象となる目的音の到
来方向である第１の入力方向を、前記決定した方の目的
音方向推定手段で得た目的音方向に基づいて逐次修正す
るので、第１のビームフォーマは第１の入力方向以外か
ら到来する雑音成分を抑圧して話者の音声成分を低雑音
で抽出することになる。As a result, the target sound direction correction means obtains the first input direction, which is the arrival direction of the target sound to be input in the first beam former, by the determined target sound direction estimation means. The first beamformer suppresses the noise components coming from other than the first input direction and extracts the speaker's voice component with low noise, because the correction is sequentially performed based on the target sound direction.

【０１７５】このように本システムは雑音成分を抑圧し
た音声周波数成分と、音声成分を抑圧した雑音周波数成
分とを別々に得ることができるが、この発明の最大の特
徴は、第１乃至第３のビームフォーマとして、周波数領
域で動作するビームフォーマを用いるようにした点にあ
る。そして、このことによって、計算量を大幅に削減す
ることができるようにしている。As described above, the present system can separately obtain the voice frequency component in which the noise component is suppressed and the noise frequency component in which the voice component is suppressed. The greatest feature of the present invention is that the first to third The point is that a beamformer that operates in the frequency domain is used as the beamformer. And by doing so, the amount of calculation can be significantly reduced.

【０１７６】そしてこの発明によると、適応フィルタの
処理量が大幅に低減されるのに加え、入力音声に対する
周波数分析以外の周波数分析処理を省略することがで
き、かつ、フィルタ演算時に必要であった時間領域から
周波数領域ヘの変換処理も不要となり、全体の演算量を
大幅に削減することができる。According to the present invention, the processing amount of the adaptive filter is significantly reduced, and frequency analysis processing other than the frequency analysis on the input voice can be omitted, and it is necessary for the filter calculation. The conversion processing from the time domain to the frequency domain is also unnecessary, and the overall calculation amount can be significantly reduced.

【０１７７】また、本発明では、雑音追尾に監視領域を
全く異ならせた雑音追尾用のビームフォーマを設けてあ
り、それぞれの出力からそれぞれ音声方向を推定させる
と共に、それぞれの推定結果からいずれが有効な雑音追
尾をしているかを判断して、有効と判断された方のビー
ムフォーマのフィルタ係数による音声方向の推定結果を
第１の目的音方向修正手段に与えることで第１の目的音
方向修正手段は、前記第１のビームフォーマにおいて入
力対象となる目的音の到来方向である第１の入力方向
を、前記目的音方向推定手段で推定された目的音方向に
基づいて逐次修正するので、第１のビームフォーマは第
１の入力方向以外から到来する雑音成分を抑圧して話者
の音声成分を低雑音で抽出することができ、雑音源が移
動してもこれを見失うことなく追尾して抑圧することが
できるようになるものである。Further, according to the present invention, a noise tracking beamformer having completely different monitoring areas for noise tracking is provided, and the voice direction is estimated from each output, and which is effective from each estimation result. It is determined whether or not the noise tracking is performed properly, and the result of estimation of the voice direction by the filter coefficient of the beamformer that is determined to be effective is given to the first target sound direction correction means to correct the first target sound direction. The means sequentially corrects the arrival direction of the target sound as the input target in the first beam former, based on the target sound direction estimated by the target sound direction estimation means. The beamformer No. 1 can suppress the noise component coming from other than the first input direction and extract the speaker's voice component with low noise, and even if the noise source moves, this is lost. In which it is possible to suppress and tracking without.

【０１７８】従来技術においては、２ｃｈ、すなわち、
２本のマイクロホンだけでも目的音源の追尾を可能とす
べく、雑音追尾用のビームフォーマを雑音抑圧のビーム
フォーマとは別に１個用いるが、例えば、雑音源が目的
音の方向を横切って移動したような場合、雑音の追尾精
度が低下することがあった。In the prior art, 2ch, that is,
In order to enable tracking of the target sound source with only two microphones, one beamformer for noise tracking is used separately from the beamformer for noise suppression. For example, the noise source has moved across the direction of the target sound. In such a case, noise tracking accuracy may decrease.

【０１７９】しかし、本発明では、雑音を追尾するビー
ムフォーマを複数用いて各々別個の追尾範囲を受け持つ
ようにしたことにより、上記のような場合でも追尾精度
の低下を抑止できるようになる。However, according to the present invention, since a plurality of beam formers for tracking noise are used to take charge of separate tracking ranges, it is possible to prevent the tracking accuracy from deteriorating even in the above case.

【０１８０】以上の構成例１及び構成例２の話者追尾マ
イクロホンアレイ１０は、演算負荷の軽減を図りつつ、
主として方向を持つ雑音について抑圧できるようにする
例であった。そして、これらはテレビ会議システムなど
のように、話者音源の配置がわかっていて、しかも、環
境的に雑音が少ないような環境下での利用に適している
が、レベルも特性もまちまちで雑多な雑音の影響を受け
る屋外や、大勢の人の集まる店舗や駅と云った所で使用
するには十分でない可能性がある。The speaker tracking microphone array 10 of the configuration examples 1 and 2 described above, while reducing the calculation load,
This is an example that mainly suppresses noise having a direction. These are suitable for use in environments where the sound source of the speaker is known, such as in video conferencing systems, and where there is little environmental noise, but the levels and characteristics are mixed and diverse. There is a possibility that it may not be enough to be used outdoors where it is affected by various noises, or in a store or a station where a large number of people gather.

【０１８１】そこで、方向性の無い背景雑音も効果的に
抑制できるようにしたいところであるが、それには次に
説明するように、スペクトルサブトラクション（ＳＳ）
処理機能をさらに付加すればよい。Therefore, it is desired to effectively suppress non-directional background noise as well, but as will be described next, the spectral subtraction (SS) is performed.
A processing function may be added.

【０１８２】すなわち、方向性のある雑音はビームフォ
ーマにより抑圧し、方向性のない背景雑音はスペクトル
サブトラクション（ＳＳ）処理により、抑圧する。その
ためには、図３または図７の構成のシステムの後段に、
更に図１０の構成のスペクトルサブトラクション（Ｓ
Ｓ）処理部３０を接続した構成とする。That is, the directional noise is suppressed by the beam former, and the non-directional background noise is suppressed by the spectral subtraction (SS) process. To that end, in the latter stage of the system having the configuration of FIG. 3 or FIG.
Further, the spectral subtraction (S
S) The processing unit 30 is connected.

【０１８３】スペクトルサブトラクション（ＳＳ）処理
部３０は図に示すように、音声帯域パワー計算部３１、
雑音帯域パワー計算部３２、帯域重み計算部３３、スペ
クトル減算部３４から構成されている。The spectrum subtraction (SS) processing unit 30 includes a voice band power calculation unit 31, as shown in FIG.
It is composed of a noise band power calculation unit 32, a band weight calculation unit 33, and a spectrum subtraction unit 34.

【０１８４】これらのうち、音声帯域パワー計算部３１
は、前記ビームフォーマ１３により得られた音声周波数
を、周波数帯域毎に分割して帯域毎の音声パワーを計算
するものであり、雑音帯域パワー計算部３２は、前記ビ
ームフォーマ１６により得られた雑音周波数成分（また
はビームフォーマ１６，２２によりそれぞれ得られ、有
効雑音決定部２４により選択されて出力された雑音周波
数成分）を、周波数帯域毎に分割して帯域毎の雑音パワ
ーを計算するものである。Of these, the voice band power calculation unit 31
Is for dividing the voice frequency obtained by the beamformer 13 into frequency bands and calculating the voice power for each band. The noise band power calculator 32 is configured to calculate the noise obtained by the beamformer 16. The frequency components (or the noise frequency components respectively obtained by the beam formers 16 and 22 and selected by the effective noise determination unit 24 and output) are divided into frequency bands to calculate noise power for each band. .

【０１８５】帯域重み計算部３３は、帯域ｋ毎に、得ら
れた音声の平均帯域パワーＰv（k）と雑音の平均帯域パ
ワーＰn（k）を用い、帯域毎の帯域重み係数Ｗ（ｋ）を
計算するものであり、修正スペクトル減算部３４は、前
記入力帯域パワー計算部３１にて計算された入力帯域パ
ワーと、音声帯域パワー計算部３１で計算された音声帯
域パワーとに基き、音声信号の周波数帯域毎に重みをか
けて背景雑音を抑圧するものである。The band weight calculation unit 33 uses, for each band k, the obtained average band power Pv (k) of voice and average band power Pn (k) of noise, and the band weight coefficient W (k) for each band. The corrected spectrum subtraction unit 34 calculates the voice signal based on the input band power calculated by the input band power calculation unit 31 and the voice band power calculated by the voice band power calculation unit 31. The background noise is suppressed by weighting each frequency band.

【０１８６】音声帯域パワー計算部３１で用いる音声周
波数成分と、雑音帯域パワー計算部３２で用いる雑音周
波数成分は、いずれも実施例０Ａ１あるいは実施例０Ａ
２のビームフォーマの２つの出力である目的音声成分と
雑音成分を利用する。そして、一般に、スペクトルサブ
トラクション（ＳＳ）として知られる雑音抑圧処理によ
り、方向性のない背景雑音成分の抑圧を行う。Both the voice frequency component used in the voice band power calculation unit 31 and the noise frequency component used in the noise band power calculation unit 32 are either in Example 0A1 or Example 0A.
The target speech component and the noise component, which are the two outputs of the two beam formers, are used. Then, in general, a noise suppression process known as spectrum subtraction (SS) suppresses a background noise component having no directivity.

【０１８７】一般的に行われるスペクトルサブトラクシ
ョン（ＳＳ）は、１チャネルのマイクロホン（つまり、
１本のマイクロホン）を用い、このマイクロホンの出力
から音声のない区間において雑音のパワーを推定するた
め、非定常な雑音が音声に重畳している場合には対処で
きない。Generally performed spectral subtraction (SS) is a one-channel microphone (that is,
Since one microphone) is used and the power of noise is estimated from the output of this microphone in the section where there is no voice, it is not possible to deal with the case where non-stationary noise is superimposed on the voice.

【０１８８】また、２チャネルのマイクロホン（つま
り、２本のマイクロホン）を用いて、一方を雑音収集
用、片方を雑音重畳音声収集用とする場合にも、両マイ
クロホンの設置場所を離す必要があり、その結果、音声
に重畳する雑音と、雑音収集用マイクロホンで取り込む
雑音との位相がずれ、スペクトルサブトラクションして
も雑音抑圧の改善効果は大きく上がらなかった。Also, when two-channel microphones (that is, two microphones) are used and one is used for noise collection and one is used for noise-superimposed voice collection, both microphones must be installed at different locations. As a result, the noise superimposed on the voice and the noise captured by the noise collecting microphone are out of phase, and the effect of improving noise suppression is not significantly improved even if spectral subtraction is performed.

【０１８９】本実施例では、雑音成分を取り出すビーム
フォーマを用意して、このビームフォーマの出力を用い
るようにしたため、構成例１および構成例２で述べたよ
うに、位相のずれが補正され、非定常雑音の場合でも高
精度なスペクトルサブトラクション（ＳＳ）を実現でき
る。In this embodiment, since a beamformer for extracting a noise component is prepared and the output of this beamformer is used, the phase shift is corrected as described in the configuration examples 1 and 2. Even in the case of non-stationary noise, highly accurate spectrum subtraction (SS) can be realized.

【０１９０】さらに、周波数領域のビームフォーマの出
力を利用しているため、周波数分析を省略してスペクト
ルサブトラクションが可能であり、従来より少ない演算
量で非定常雑音を抑圧できる。Further, since the output of the beamformer in the frequency domain is used, frequency analysis can be omitted and spectral subtraction can be performed, and non-stationary noise can be suppressed with a smaller amount of calculation than in the past.

【０１９１】以下、具体的なスペクトルサブトラクショ
ン（ＳＳ）方法について述べる。A specific spectral subtraction (SS) method will be described below.

【０１９２】＜スペクトルサブトラクション（ＳＳ）の
原理＞まず、スペクトルサブトラクションの原理につい
て説明する。<Principle of Spectral Subtraction (SS)> First, the principle of spectrum subtraction will be described.

【０１９３】目的音声用ビームフォーマ（第１のビーム
フォーマ１３）の出力をＰｖ、雑音用ビームフォーマ
（第２または第３のビームフォーマ１６または２２）の
出力をＰｎとすると、Ｐｖ＝Ｖ＋Ｂ′ Ｐｎ＝Ｎ＋Ｂ″ と表すことができる。ここで、Ｖは音声成分のパワー、
Ｂ′は音声出力に含まれる背景雑音のパワーであり、Ｎ
は雑音源成分のパワー、Ｂ″は雑音出力に含まれる背景
雑音のパワーである。これらのうち、音声出力成分に含
まれる背景雑音成分を、スペクトルサブトラクション処
理により抑圧する。If the output of the target speech beamformer (first beamformer 13) is Pv and the output of the noise beamformer (second or third beamformer 16 or 22) is Pn, then Pv = V + B 'Pn = N + B ″, where V is the power of the voice component,
B ′ is the power of the background noise included in the voice output, and N ′
Is the power of the noise source component, and B ″ is the power of the background noise included in the noise output. Of these, the background noise component included in the voice output component is suppressed by the spectral subtraction process.

【０１９４】音声出力成分中のＢ′は、雑音出力成分中
のＢ″と同等であり、雑音源成分のパワーＮも音声成分
のパワーＶに比べて小さいとすると、Ｂ′＝Ｐｎと考え
ることができ、スペクトルサブトラクション（ＳＳ）処
理用の重み係数Ｗは以下のように求めることができる。
すなわち、ＷはＷ＝（Ｐv−Ｐn）／Ｐv 〜Ｖ／（Ｖ＋Ｂ′）となり、Ｖ〜Ｐｖ＊Ｗとして音声成分を近似的に求めることができる。If B ′ in the voice output component is equal to B ″ in the noise output component and the power N of the noise source component is smaller than the power V of the voice component, consider B ′ = Pn. And the weighting coefficient W for spectral subtraction (SS) processing can be obtained as follows.
That is, W is W = (Pv-Pn) / Pv - V / (V + B '), and the voice component can be approximately obtained as V - Pv * W.

【０１９５】このような処理は、スペクトルサブトラク
ション処理部３０にて行わせるが、当該スペクトルサブ
トラクション処理部３０の具体例を次に述べる。Such processing is performed by the spectrum subtraction processing unit 30. A specific example of the spectrum subtraction processing unit 30 will be described below.

【０１９６】＜スペクトルサブトラクション処理部の構
成例＞図１０にスペクトルサブトラクション（ＳＳ）処
理に必要な構成を、また、図１１にスペクトルサブトラ
クション処理手順を示す。<Example of Configuration of Spectrum Subtraction Processing Unit> FIG. 10 shows the configuration required for the spectrum subtraction (SS) process, and FIG. 11 shows the spectrum subtraction process procedure.

【０１９７】図１０に示すように、スペクトルサブトラ
クション処理部３０は、音声帯域パワー計算部３１、雑
音帯域パワー計算部３２、帯域重み計算部３３、スペク
トル減算部３４、制御部３５とより構成されている。As shown in FIG. 10, the spectrum subtraction processing unit 30 is composed of a voice band power calculation unit 31, a noise band power calculation unit 32, a band weight calculation unit 33, a spectrum subtraction unit 34, and a control unit 35. There is.

【０１９８】これらのうち、音声帯域パワー計算部３１
は、第１のビームフォーマ１３からの出力である音声周
波数成分を用いて音声帯域パワーを計算すると共にこの
計算されたパワー値を時間方向に平均化し、帯域毎に平
均パワー（音声の平均帯域パワーＰv（k））を求めて、
帯域重み計算部３３に与えるものであり、雑音帯域パワ
ー計算部３２は、第２のビームフォーマ１６（または第
３のビームフォーマ２２）からの出力である雑音周波数
成分を用いて雑音帯域パワーを計算すると共にこの計算
されたパワー値を時間方向に平均化し、帯域毎に平均パ
ワー（雑音の平均帯域パワーＰn（k））を求めて帯域重
み計算部３３に与えるものである。Of these, the voice band power calculation unit 31
Calculates the voice band power using the voice frequency component that is the output from the first beamformer 13, averages the calculated power values in the time direction, and outputs the average power (the average band power of the voice) for each band. Pv (k))
The noise band power calculation unit 32 calculates the noise band power by using the noise frequency component output from the second beam former 16 (or the third beam former 22). At the same time, the calculated power values are averaged in the time direction, the average power (average band power Pn (k) of noise) is obtained for each band, and given to the band weight calculation unit 33.

【０１９９】また、帯域重み計算部３３は、帯域ｋ毎
に、得られた音声の平均帯域パワーＰv（k）と雑音の平
均帯域パワーＰn（k）を用い、帯域毎の帯域重み係数Ｗ
（ｋ）を計算するものであり、スペクトル減算部３４
は、帯域重み計算部３３で計算された帯域毎の重み係数
Ｗ（k）を用い、第１のビームフォーマ１３より入力さ
れる音声周波数成分Ｐv（k）に重みをかけることにより
雑音成分を抑圧した音声周波数成分Ｐv（k）′を求める
ものである。Further, the band weight calculation unit 33 uses the obtained average band power Pv (k) of voice and the average band power Pn (k) of noise for each band k, and uses the band weight coefficient W for each band.
(K) is calculated, and the spectrum subtraction unit 34
Suppresses the noise component by weighting the voice frequency component Pv (k) input from the first beamformer 13 using the weighting coefficient W (k) for each band calculated by the band weight calculation unit 33. The calculated voice frequency component Pv (k) 'is obtained.

【０２００】制御部３５は、スペクトルサブトラクショ
ン制御部５０からの信号を受け、その信号種別対応に、
帯域重み計算部３３を制御するものであって、スペクト
ルサブトラクション制御部５０からの信号が“０”の時
は、帯域重み計算部３３に最小重みを発生させるように
制御指令を与えて帯域重み計算部３３の出力が最小重み
となるように制御し、また、スペクトルサブトラクショ
ン制御部５０からの信号が“１”および“２”の時は、
制御部３５は通常の重み係数を求めるように帯域重み計
算部３３を制御するものである。The control unit 35 receives the signal from the spectrum subtraction control unit 50 and, in accordance with the signal type,
It controls the band weight calculation unit 33, and when the signal from the spectrum subtraction control unit 50 is “0”, a control command is given to the band weight calculation unit 33 so as to generate the minimum weight, and the band weight calculation is performed. The output of the unit 33 is controlled to have the minimum weight, and when the signals from the spectrum subtraction control unit 50 are "1" and "2",
The control unit 35 controls the band weight calculation unit 33 so as to obtain a normal weight coefficient.

【０２０１】制御部３５は、スペクトルサブトラクショ
ン制御部５０からの信号を受けた場合に、帯域重み計算
部３３とスペクトル減算部３４をスペクトルサブトラク
ション制御部５０からの信号種別対応に、所要の制御を
行うものであって、スペクトルサブトラクション制御部
５０からの信号が“０”の時は、帯域重み計算部３３に
最小重みを発生させるように制御指令を与えて帯域重み
計算部３３の出力が最小重みとなるようにし、第１のビ
ームフォーマ１３の出力する音声周波数成分にこの最小
重みをかける計算をスペクトル減算部３４に実施させて
出力信号をカットするように制御し、また、スペクトル
サブトラクション制御部５０からの信号が“１”の時
は、制御部３５は音声区間に突発性雑音が重畳している
と見なし、第２のビームフォーマ１６からの出力を雑音
成分として扱う２チャネルのスペクトルサブトラクショ
ン処理を行うようスペクトル減算部３４を制御し、これ
により、スペクトル減算部３４は第２のビームフォーマ
１６からの出力を雑音成分として扱う２チャネルのスペ
クトルサブトラクション処理を行わせる制御をし、スペ
クトルサブトラクション制御部５０からの信号が“２”
の時は、音声のみの区間と見なして、第１のビームフォ
ーマ１３の出力に対し、１チャネルのスペクトルサブト
ラクションを行うよう制御部３５がスペクトル減算部３
４を制御して、スペクトル減算部３４に第１のビームフ
ォーマ１３の出力に対し、１チャネルのスペクトルサブ
トラクションを行わせるように制御する。When the control section 35 receives a signal from the spectrum subtraction control section 50, the control section 35 controls the band weight calculation section 33 and the spectrum subtraction section 34 according to the signal type from the spectrum subtraction control section 50. When the signal from the spectrum subtraction control unit 50 is “0”, a control command is given to the band weight calculation unit 33 to generate the minimum weight, and the output of the band weight calculation unit 33 becomes the minimum weight. Then, the spectrum subtraction unit 34 is controlled to cut the output signal by causing the spectrum subtraction unit 34 to perform a calculation for applying the minimum weight to the audio frequency component output from the first beam former 13, and from the spectrum subtraction control unit 50. When the signal of “1” is “1”, the control unit 35 considers that the sudden noise is superposed on the voice section, and the second The spectrum subtraction unit 34 is controlled to perform a two-channel spectrum subtraction process in which the output from the mu-former 16 is treated as a noise component, whereby the spectrum subtraction unit 34 treats the output from the second beam former 16 as a noise component. The signal from the spectrum subtraction control unit 50 is controlled to "2" by controlling the spectrum subtraction processing of the channel.
In the case of, the control unit 35 regards the output of the first beamformer 13 as one-channel spectrum subtraction, assuming that it is a voice-only section.
4 to control the spectrum subtraction unit 34 to perform one-channel spectrum subtraction on the output of the first beam former 13.

【０２０２】尚、スペクトルサブトラクション制御部５
０からの信号が“１”の時と“２”の時は、帯域重み計
算部３３には帯域ｋ毎に、得られた音声の平均帯域パワ
ーＰv（k）と雑音の平均帯域パワーＰn（k）を用い、帯
域毎の帯域重み係数Ｗ（ｋ）を計算する形態をとらせ、
スペクトルサブトラクション制御部５０からの信号が
“０”の時のみ、帯域重み係数Ｗ（ｋ）を最小にする形
態をとらせるべく制御部３５は制御する構成としてあ
る。The spectrum subtraction control unit 5
When the signal from 0 is "1" and "2", the band weight calculation unit 33 causes the band weight calculator 33 to calculate the average band power Pv (k) of the obtained voice and the average band power Pn ( k), a band weighting coefficient W (k) for each band is calculated,
Only when the signal from the spectrum subtraction control unit 50 is "0", the control unit 35 is configured to control so as to minimize the band weighting coefficient W (k).

【０２０３】２つのビームフォーマ１３，１５（または
２２）からの出力として音声周波数成分と雑音周波数成
分が得られる。第１のビームフォーマ１３からの出力で
ある音声周波数成分を用いて音声帯域パワー計算が実施
され（ステップＳ５１）、ビームフォーマ１５（または
２２）からの出力である雑音周波数成分を用いて雑音帯
域パワー計算が実施される（ステップＳ５２）。An audio frequency component and a noise frequency component are obtained as outputs from the two beam formers 13 and 15 (or 22). The voice band power calculation is performed using the voice frequency component output from the first beam former 13 (step S51), and the noise band power is calculated using the noise frequency component output from the beam former 15 (or 22). Calculation is performed (step S52).

【０２０４】ここでのパワー計算は、上述した本発明シ
ステムの音声周波数成分と雑音周波数成分を利用してお
り、これらはビームフォーマの処理を周波数領域で行っ
ていることから、周波数分析なしに、そのまま音声およ
び雑音の周波数成分の各帯域毎にパワーの計算を実行で
きる。The power calculation here uses the voice frequency component and the noise frequency component of the above-described system of the present invention. Since these processes the beamformer in the frequency domain, there is no need for frequency analysis. As it is, the power can be calculated for each band of the frequency components of voice and noise.

【０２０５】次に、計算されたパワー値を時間方向に平
均化し、帯域毎に平均パワーを求める（ステップＳ５
３）。帯域重み計算部３３では、帯域ｋ毎に、得られた
音声の平均帯域パワーＰv（k）と雑音の平均帯域パワー
Ｐn（k）を用い、次式により、帯域毎の帯域重み係数Ｗ
（ｋ）を計算する。Next, the calculated power values are averaged in the time direction to obtain the average power for each band (step S5).
3). The band weight calculator 33 uses the obtained average band power Pv (k) of voice and the average band power Pn (k) of noise for each band k, and uses the following formula to calculate the band weight coefficient W for each band.
Calculate (k).

【０２０６】Ｗ（k）＝（Ｐｖ（k）−Ｐｎ（k））／Ｐ
ｖ（k）（Ｐｖ（k）＞Ｐｎ（k）の時）Ｗ（k）＝Ｗmin（Ｐv（k）＜＝Ｐn（k）の時）帯域重みは最大値１．０と最小値Ｗminの間の値をと
り、Ｗminの値は例えば“０．０１”等とする。W (k) = (Pv (k) -Pn (k)) / P
v (k) (when Pv (k)> Pn (k)) W (k) = Wmin (when Pv (k) <= Pn (k)) The band weight has the maximum value 1.0 and the minimum value Wmin. The value of Wmin is set to, for example, "0.01".

【０２０７】次にスペクトル減算部３４では、帯域重み
計算部３３で計算された帯域毎の重み係数Ｗ（k）を用
い、入力の音声周波数成分Ｐv（k）に重みをかけ、雑音
成分を抑圧した音声周波数成分Ｐv（k）′を求める（ス
テップＳ５４）。Next, the spectrum subtraction unit 34 uses the weighting coefficient W (k) for each band calculated by the band weight calculation unit 33 to weight the input voice frequency component Pv (k) and suppress the noise component. The voice frequency component Pv (k) 'is calculated (step S54).

【０２０８】Ｐv（k）′＝Ｐv（k）＊Ｗ（k）こうして、方向のない背景雑音はスペクトルサブトラク
ション（ＳＳ）処理により、抑圧され、方向を持つ雑音
は前述のビームフォーマにより抑圧されて、結果的に高
精度の雑音抑圧が可能となる。Pv (k) '= Pv (k) * W (k) Thus, the background noise having no direction is suppressed by the spectral subtraction (SS) process, and the noise having a direction is suppressed by the beam former described above. As a result, highly accurate noise suppression becomes possible.

【０２０９】スペクトルサブトラクション処理部３０で
は、話者追尾マイクロホンアレイ１０の第１のビームフ
ォーマ１３から与えられる音声周波数成分と第２のビー
ムフォーマ１６（または、第３のビームフォーマ２２）
から与えられる雑音周波数成分とを受け、基本的にはこ
のような処理により雑音圧縮を行うが、スペクトルサブ
トラクション制御部５０から出力される３種類の信号に
したがって所要の雑音圧縮処理を行うことで、突発的な
雑音の抑圧も可能にしている。In the spectrum subtraction processing unit 30, the voice frequency component given from the first beam former 13 of the speaker tracking microphone array 10 and the second beam former 16 (or the third beam former 22) are used.
The noise frequency component given by the above is basically received, and the noise compression is basically performed by such a process. However, by performing the required noise compression process in accordance with the three kinds of signals output from the spectrum subtraction control unit 50, It also enables suppression of sudden noise.

【０２１０】すなわち、スペクトルサブトラクション処
理部３０では、スペクトルサブトラクション制御部５０
からの信号を受け取り、信号が“０”の時は、ほとんど
雑音区間と見なせるので、制御部３５は帯域重み計算部
３３に最小重み係数を出力させ、これを用いてスペクト
ル減算部３４に音声周波数成分に対する計算処理をさせ
ることで、スペクトルサブトラクション処理部３０から
の出力信号をカットする。That is, in the spectrum subtraction processing unit 30, the spectrum subtraction control unit 50.
When the signal is "0", it can be regarded as a noise zone. Therefore, the control unit 35 causes the band weight calculation unit 33 to output the minimum weight coefficient, and uses this to cause the spectrum subtraction unit 34 to output the voice frequency. The output signal from the spectral subtraction processing unit 30 is cut by performing the calculation process on the component.

【０２１１】また、スペクトルサブトラクション制御部
５０からの信号が“１”の時は、音声区間に突発性雑音
が重畳していると見なし、制御部３５は帯域重み計算部
３３に通常の重み係数計算をさせて、得られた重み係数
をスペクトル減算部３４に与えるようにし、これを用い
てスペクトル減算部３４に第２のビームフォーマ１６か
らの出力を雑音成分として扱う２チャネルのスペクトル
サブトラクション処理を行わせる。Further, when the signal from the spectrum subtraction control unit 50 is "1", it is considered that sudden noise is superposed on the voice section, and the control unit 35 causes the band weight calculation unit 33 to calculate a normal weight coefficient. The obtained weighting coefficient is given to the spectrum subtraction unit 34, and the spectrum subtraction unit 34 is used to perform a two-channel spectrum subtraction process in which the output from the second beam former 16 is treated as a noise component. Let

【０２１２】また、スペクトルサブトラクション処理部
３０では、スペクトルサブトラクション制御部５０から
の信号が“２”の時は、音声のみの区間と見なして、制
御部３５は帯域重み計算部３３に通常の重み係数計算を
させて、得られた重み係数をスペクトル減算部３４に与
えるようにし、これを用いてスペクトル減算部３４に第
１のビームフォーマ１３の出力に対し、１チャネルのス
ペクトルサブトラクションを行うよう制御し、これによ
り、スペクトル減算部３４は第１のビームフォーマ１３
の出力に対し、１チャネルのスペクトルサブトラクショ
ンを行い、音声周波数成分の出力として出力する。Further, in the spectrum subtraction processing unit 30, when the signal from the spectrum subtraction control unit 50 is “2”, it is considered as a voice-only section, and the control unit 35 causes the band weight calculation unit 33 to use the normal weighting factor. The weighting factor thus obtained is calculated and given to the spectrum subtraction unit 34, and the spectrum subtraction unit 34 is controlled using this so as to perform one-channel spectrum subtraction on the output of the first beamformer 13. Accordingly, the spectrum subtraction unit 34 causes the first beam former 13 to
1-channel spectral subtraction is performed on the output of the above, and output as the output of the audio frequency component.

【０２１３】この構成の場合、音声帯域パワー計算手段
は、得られた音声周波数のスペクトル成分を、周波数帯
域毎に分割して帯域毎の音声パワーを計算し、雑音帯域
パワー計算手段は、前記得られた雑音周波数のスペクト
ル成分を、周波数帯域毎に分割して帯域毎の雑音パワー
を計算する。そして、スペクトル減算手段は、前記音声
帯域パワー計算手段と雑音帯域パワー計算手段とから得
られる音声と雑音の周波数帯域パワーに基き、音声信号
の周波数帯域毎に重みをかけて背景雑音を抑圧する。In the case of this configuration, the voice band power calculation means divides the obtained spectrum component of the voice frequency into frequency bands to calculate the voice power for each band, and the noise band power calculation means obtains the obtained voice power. The spectral component of the obtained noise frequency is divided for each frequency band, and the noise power for each band is calculated. Then, the spectrum subtracting means suppresses the background noise by weighting each frequency band of the voice signal based on the frequency band power of the voice and noise obtained from the voice band power calculating means and the noise band power calculating means.

【０２１４】この構成によれば、ビームフォーマでは抑
圧できない方向性のない雑音（背景雑音）は、本発明シ
ステムのビームフォーマで得ることのできる目的音声成
分と雑音成分を利用し、これをスペクトルサブトラクシ
ョン処理することで抑圧する。すなわち、本システムで
は、ビームフォーマとして目的音声成分抽出用と雑音成
分抽出用の２つのビームフォーマを備えているが、これ
らのビームフォーマの出力である目的音声成分と雑音成
分を利用してスペクトルサブトラクション処理すること
により、方向性のない背景雑音成分の抑圧を行う。スペ
クトルサブトラクション（ＳＳ）処理は雑音抑圧処理と
して知られるが、一般的に行われるスペクトルサブトラ
クション（ＳＳ）処理は、１チャネルのマイクロホン
（つまり、１本のマイクロホン）を用い、このマイクロ
ホンの出力から音声のない区間において雑音のパワーを
推定するため、非定常な雑音が音声に重畳している場合
には対処できない。また、２チャネルのマイクロホン
（つまり、２本のマイクロホン）を用いて、一方を雑音
収集用、片方を雑音重畳音声収集用とする場合にも、両
マイクロホンの設置場所を離す必要があり、その結果、
音声に重畳する雑音と、雑音収集用マイクロホンで取り
込む雑音との位相がずれ、スペクトルサブトラクション
処理しても雑音抑圧の改善効果は大きく上がらない。According to this structure, the nondirectional noise (background noise) that cannot be suppressed by the beamformer uses the target voice component and noise component that can be obtained by the beamformer of the system of the present invention, and uses this for spectral subtraction. It is suppressed by processing. That is, this system is provided with two beam formers for extracting a target voice component and a noise component as beam formers. The target voice component and the noise component, which are outputs of these beam formers, are used to perform spectral subtraction. By processing, background noise components having no directionality are suppressed. Spectral subtraction (SS) processing is known as noise suppression processing, but generally performed spectral subtraction (SS) processing uses a one-channel microphone (that is, one microphone), and the output of this microphone is used to output audio. Since the noise power is estimated in a non-existing section, it cannot be dealt with when non-stationary noise is superimposed on the speech. In addition, when two-channel microphones (that is, two microphones) are used and one is used for noise collection and one is used for noise-superimposed voice collection, it is necessary to separate the installation locations of both microphones. ,
The noise superimposed on the voice and the noise captured by the noise collecting microphone are out of phase with each other, and the effect of improving noise suppression is not significantly improved even if the spectral subtraction process is performed.

【０２１５】しかし、本発明では、雑音成分を取り出す
ビームフォーマを用意して、このビームフォーマの出力
を用いるようにしたため、位相のずれは補正されてお
り、従って、非定常雑音の場合でも高精度なスペクトル
サブトラクション処理を実現できる。さらに、周波数領
域のビームフォーマの出力を利用しているため、周波数
分析を省略してスペクトルサブトラクションが可能であ
り、従来より少ない演算量で非定常雑音を抑圧できる。However, in the present invention, since the beam former for extracting the noise component is prepared and the output of this beam former is used, the phase shift is corrected. Therefore, even in the case of non-stationary noise, the accuracy is high. It is possible to realize various spectrum subtraction processing. Furthermore, since the output of the beamformer in the frequency domain is used, frequency analysis can be omitted and spectral subtraction can be performed, and non-stationary noise can be suppressed with a smaller amount of calculation than before.

【０２１６】＜方向性検出部４０の構成例＞次に、本システムにおける重要な構成要素である方向性
検出部４０の構成例について説明する。<Example of Configuration of Directionality Detection Unit 40> Next, an example of the configuration of the directionality detection unit 40, which is an important component in the present system, will be described.

【０２１７】図１２は、方向性検出部４０の基本構成を
示すブロック図である。図１２において、１は対向指向
性マイクロホン、２は周波数分析部であり、これらは話
者追尾マイクロホンアレイ１０の構成要素である。４１
は位相差相当量計算部、４２はチャネル間パワー比計算
部、４３は方向性計算部であり、これら位相差相当量計
算部４１、チャネル間パワー比計算部４２、方向性計算
部４３で方向性検出部４０を構成している。FIG. 12 is a block diagram showing the basic structure of the directionality detecting section 40. In FIG. 12, reference numeral 1 is a counter-directional microphone, 2 is a frequency analysis unit, and these are components of the speaker tracking microphone array 10. 41
Is a phase difference equivalent amount calculation unit, 42 is an inter-channel power ratio calculation unit, and 43 is a directionality calculation unit. The phase difference equivalent amount calculation unit 41, the inter-channel power ratio calculation unit 42, and the directionality calculation unit 43 The sex detector 40 is configured.

【０２１８】対向指向性マイクロホン１は、上述したよ
うに指向性のある２本のマイクロホンを互いに軸を傾け
て配置した２チャネル（２ｃｈ）マイクロホンであり、
周波数分析部１２は、この対向指向性マイクロホン１か
らの音声信号２チャネル分を受け、各チャネル毎に例え
ば、高速フーリエ変換（ＦＦＴ）等により、周波数成分
を計算するものであり、位相差相当量計算部４１は、こ
の周波数分析部１２の計算した各チャネルの周波数成分
から２チャネル間の位相差に対応する位相差相当量Ｔを
求めるものである。The facing directional microphone 1 is a two-channel (2ch) microphone in which two directional microphones are arranged with their axes inclined as described above.
The frequency analysis unit 12 receives two channels of the audio signal from the facing directional microphone 1 and calculates frequency components for each channel by, for example, fast Fourier transform (FFT), and the phase difference equivalent amount. The calculation unit 41 calculates the phase difference equivalent amount T corresponding to the phase difference between the two channels from the frequency component of each channel calculated by the frequency analysis unit 12.

【０２１９】チャネル間パワー比計算部４２は、周波数
分析部１２の計算した各チャネルの周波数成分のデータ
を各チャネル別に合計してチャネル毎の全周波数成分の
合計パワーの大きさを求め、求めたチャネル別合計パワ
ーの大きさの値を用いて、両チャネル間の値の比率二種
（一方対他方と、他方対一方の二種）を求め、求めたう
ちの大きい方をパワー比Ｒとするものである。The inter-channel power ratio calculation unit 42 sums the data of the frequency components of each channel calculated by the frequency analysis unit 12 for each channel to obtain the total power of all frequency components for each channel, and obtains the magnitude. Using the value of the magnitude of the total power for each channel, two types of ratios of the values between the two channels (one type for the other type and the other type for the one type) are obtained, and the larger one of them is set as the power ratio R. It is a thing.

【０２２０】また、方向性計算部４３は位相差相当量計
算部４１で求めた位相差相当量Ｔとパワー比計算部４２
で求めたパワー比Ｒを用いて両者を乗算し、その結果を
方向性指標Ｄとして求めるためのものである。Further, the directivity calculating section 43 has a phase difference corresponding amount T calculated by the phase difference corresponding amount calculating section 41 and a power ratio calculating section 42.
This is for multiplying the two by using the power ratio R obtained in 1. and obtaining the result as the directionality index D.

【０２２１】次に、上記構成の方向性検出部４０の作用
を説明する。Next, the operation of the directionality detecting section 40 having the above structure will be described.

【０２２２】本発明では、２チャネル以上のマイクロホ
ンを用いた雑音抑圧処理技術を実現するが、ここでは簡
単のため、最も基本的な構成である２チャネルマイクロ
ホンの場合を例にとって説明する。In the present invention, a noise suppression processing technique using a microphone of two or more channels is realized. However, for simplification, the case of a two-channel microphone having the most basic configuration will be described as an example.

【０２２３】図１３に処理の流れを示す。図１３に従っ
て説明すると、まず、２つの指向性マイクロホンを軸を
傾けて配置した対向指向性マイクロホンによる音声入力
部１１を集音場所に設置することにより、この対向指向
性マイクロホンから入力された音声は、周波数分析部１
２に送られ、各チャネル毎に例えば高速フーリエ変換
（ＦＦＴ）等により周波数成分が計算される。FIG. 13 shows the flow of processing. Explaining with reference to FIG. 13, first, the voice input from the counter-directional microphone is set by installing the voice input unit 11 by the counter-directional microphone in which two directional microphones are arranged with their axes inclined. , Frequency analysis unit 1
2, and the frequency component is calculated for each channel by, for example, fast Fourier transform (FFT).

【０２２４】方向性検出部４０では、この求められた２
チャネル分（ｃｈ１，ｃｈ２）の周波数成分のデータを
受けると、分位相差相当量計算部４１において、この受
け取った各チャネルの周波数成分のデータから２チャネ
ル間の位相差に対応する量Ｔを計算する。In the directionality detecting section 40, the calculated 2
When the data of the frequency components of the channels (ch1, ch2) is received, the minute phase difference equivalent amount calculation unit 41 calculates the amount T corresponding to the phase difference between the two channels from the received frequency component data of each channel. To do.

【０２２５】位相差相当量Ｔの求め方は、例えば、まず
２チャネル（ｃｈ１，ｃｈ２）間のクロススペクトルＷ
xyを求め、Ｗxyの虚数成分についてはそれを絶対値表現
した値を式（１）に示すように、全周波数成分に関して
和をとる。The method of obtaining the phase difference equivalent amount T is, for example, first of all, a cross spectrum W between two channels (ch1, ch2).
xy is obtained, and the imaginary number component of Wxy is expressed as an absolute value, and the sum is calculated for all frequency components as shown in equation (1).

【０２２６】Ｓ＝Σ｛Real(Wxy), ｜Imag(Wxy)｜｝ …（１）次に、式（２）に示すように、逆正接関数でＳの位相Ｔ
を求め、これを位相差相当量とする。S = Σ {Real (Wxy), | Imag (Wxy) |} (1) Next, as shown in equation (2), the phase T of S is the arctangent function.
Is obtained and this is taken as the phase difference equivalent amount.

【０２２７】Ｔ＝ａtan2(Imag(S), Real(S)) …（２）なお、ここでは全周波数成分を使うとしたが、それに限
る必要はなく、例えば、平均パワーよりも大きい周波数
成分のみを用いるようにしてもよい。T = atan2 (Imag (S), Real (S)) (2) Note that although all frequency components are used here, it is not necessary to limit to this, and only frequency components larger than the average power are used, for example. May be used.

【０２２８】また一方、チャネル間パワー比計算部４２
では、周波数分析部１２で求められた周波数成分のデー
タを用いて、チャネル別に全周波数成分の合計パワーを
式（３）により求める。On the other hand, the inter-channel power ratio calculation unit 42
Then, using the data of the frequency components obtained by the frequency analysis unit 12, the total power of all frequency components for each channel is obtained by the equation (3).

【０２２９】Ｐｘ＝ΣＸ，Ｐｙ＝ΣＹ …（３）（Ｘ，Ｙは各チャネルの各周波数帯域のパワー成分）そ
して、Ｐｘ／Ｐｙと、Ｐｙ／Ｐｘのうち大きい方の値を
パワー比Ｒとして方向性計算部４３に与える。Px = ΣX, Py = ΣY (3) (X and Y are power components of each frequency band of each channel) Then, the larger value of Px / Py and Py / Px is set as the power ratio R. This is given to the directionality calculation unit 43.

【０２３０】方向性計算部４３では、例えば、位相差相
当量計算部４１で計算した位相差相当量Ｔと、チャネル
間パワー比計算部４２で計算したパワー比Ｒを用いて、
式（４）のように方向性指標Ｄを求める。In the directionality calculation unit 43, for example, the phase difference equivalent amount T calculated by the phase difference equivalent amount calculation unit 41 and the power ratio R calculated by the inter-channel power ratio calculation unit 42 are used.
The directionality index D is obtained as in Expression (4).

【０２３１】Ｄ＝Ｔ* Ｒ …（４）また、式（４）のかわりに、式（５）のような重み付き
和により方向性指標Ｄを求めることも可能である。D = T * R (4) Further, instead of the equation (4), it is possible to obtain the directionality index D by a weighted sum like the equation (5).

【０２３２】Ｄ＝ａ* Ｔ＋ｂ* Ｒ（ａ，ｂは定数） …（５）要するに、方向性検出部４０は、周波数解析結果に基づ
き求めた位相差相当量Ｔの情報だけではなく、チャネル
間のパワー比Ｒも情報の一つとして用いて方向性指標Ｄ
を出す点が本発明のポイントである。D = a * T + b * R (a and b are constants) (5) In short, the directionality detection unit 40 is not limited to the information of the phase difference equivalent amount T obtained based on the frequency analysis result, and Direction ratio D using the power ratio R of
Is the point of the present invention.

【０２３３】方向性指標Ｄを求めるに当たり、パワー比
Ｒを導入することにより、波形の振幅差を考慮すること
になり、正面から信号が到来し、２チャネル間で全く同
じ信号が出力された場合には方向性指標Ｄは値が“０”
となり、そうでない場合は大きな値となるので、この方
向性指標Ｄの値の大小により、到来方向が正面か否かを
判断できる訳である。When the directional index D is obtained, the power ratio R is introduced to take the amplitude difference of the waveform into consideration, and when a signal arrives from the front and exactly the same signal is output between the two channels. The directionality index D has a value of "0"
If it is not, the value is large, and it is possible to judge whether the arrival direction is the front or not depending on the magnitude of the value of the directionality index D.

【０２３４】方向性検出部４０ではこの処理を、例え
ば、８［ｍsec］等の短い固定時間毎に行う。なお、以
降、当該固定時間をフレームと呼ぶ。The directionality detecting section 40 performs this process at short fixed time intervals such as 8 [msec]. The fixed time will be referred to as a frame hereinafter.

【０２３５】従って、例えば、８［ｍsec］のフレーム
周期でこのような処理を繰り返すことで、時々刻々変化
する到来音声の方向性指標Ｄを得ることができる。Therefore, for example, by repeating such processing at a frame period of 8 [msec], it is possible to obtain the directionality index D of the incoming voice that changes from moment to moment.

【０２３６】＜方向性検出部４０の別の実施例＞方向性
検出部４０はまた、図１４の如き構成でも実現可能であ
る。図１４において、１１は前述同様の対向指向性マイ
クロホン、１２は周波数分析部、４４はスペクトル正規
化部、４５はチャネル間スペクトル差計算部である。<Another Example of Directionality Detection Unit 40> The directionality detection unit 40 can also be realized by the configuration as shown in FIG. In FIG. 14, 11 is the same counter-directional microphone as described above, 12 is a frequency analysis unit, 44 is a spectrum normalization unit, and 45 is an inter-channel spectrum difference calculation unit.

【０２３７】これらのうち、対向指向性マイクロホン１
は前述同様のものであって、指向性のある２本のマイク
ロホンを互いに軸を傾けて配置した２チャネル（２ｃ
ｈ）マイクロホンである。また、周波数分析部２は、対
向指向性マイクロホン１からの音声信号２チャネル分を
受け、各チャネル毎に例えば、高速フーリエ変換（ＦＦ
Ｔ）等により、周波数成分を計算するものであり、ま
た、スペクトル正規化部４４は、周波数分析部１２から
の周波数成分のデータを受け、この周波数成分のデータ
を各チャネルそれぞれについて正規化するのもであり、
チャネル間スペクトル差計算部４５はこの正規化した周
波数成分からチャネル間スペクトル差を計算するもので
ある。Of these, the facing directional microphone 1
Is the same as that described above, and is a two-channel (2c) in which two directional microphones are arranged with their axes inclined to each other.
h) A microphone. Further, the frequency analysis unit 2 receives two channels of the audio signal from the counter directional microphone 1, and, for example, a fast Fourier transform (FF) is provided for each channel.
T) or the like to calculate the frequency component, and the spectrum normalization unit 44 receives the frequency component data from the frequency analysis unit 12 and normalizes the frequency component data for each channel. Is also
The inter-channel spectrum difference calculation unit 45 calculates the inter-channel spectrum difference from the normalized frequency component.

【０２３８】つまり、スペクトル正規化部４４、チャネ
ル間スペクトル差計算部４５にて方向検出部４０を構成
する例である。That is, this is an example in which the spectrum normalization unit 44 and the inter-channel spectrum difference calculation unit 45 constitute the direction detection unit 40.

【０２３９】本例では、演算量を削減するため、位相差
相当量Ｔを直接求めるのではなく、振幅差だけを用い
て、間接的に位相差を考慮した方向性指標Ｄを求める。In this example, in order to reduce the calculation amount, the phase difference equivalent amount T is not directly calculated, but the directionality index D indirectly considering the phase difference is calculated using only the amplitude difference.

【０２４０】図１５に処理の流れを示す。FIG. 15 shows the flow of processing.

【０２４１】まず、２つの指向性マイクロホンを軸を傾
けて配置した対向指向性マイクロホン１１から入力され
た２ｃｈ分の音声は、周波数分析部１２に送られ、各チ
ャネル毎に例えば高速フーリエ変換（ＦＦＴ）等により
周波数成分が計算される。そして、この周波数分析部１
２にて求められた各チャネル別の周波数成分はスペクト
ル正規化部４４入力される。First, the sound of 2 channels input from the counter directional microphone 11 in which the two directional microphones are arranged with their axes inclined is sent to the frequency analysis unit 12 and, for example, a fast Fourier transform (FFT) is performed for each channel. ) Etc., the frequency component is calculated. And this frequency analysis unit 1
The frequency component for each channel obtained in 2 is input to the spectrum normalization unit 44.

【０２４２】次に、スペクトル正規化部４４では、これ
ら入力された各チャネル別の周波数成分のうち、一方の
チャネルの周波数成分についてその合計パワーの平方根
Ｒｘ（式（６））を求め、この求めたＲｘで両方のチャ
ネルの周波数成分を正規化する（式（７））。Next, the spectrum normalizing section 44 finds the square root Rx (equation (6)) of the total power of the frequency components of one of the input frequency components of each channel, and obtains this. Rx normalizes the frequency components of both channels (equation (7)).

【０２４３】Ｒｘ＝√（ΣＸ） …（６）Ｘ′（ｋ）＝Ｘ（ｋ）／ＲｘＹ′（ｋ）＝Ｙ（ｋ）／Ｒｘ …（７）次に、この正規化されたチャネルの周波数成分はチャネ
ル間スペクトル差計算部２２に入力され、このチャネル
間スペクトル差計算部４５において、正規化されたスペ
クトル間の差のパワーを式（８）により計算し、これを
方向性指標Ｄとする。Rx = √ (ΣX) (6) X ′ (k) = X (k) / Rx Y ′ (k) = Y (k) / Rx (7) Next, this normalized channel The frequency component of is input to the inter-channel spectrum difference calculation unit 22, and the inter-channel spectrum difference calculation unit 45 calculates the power of the normalized difference between the spectra by the formula (8), and this is calculated as the directionality index D. And

【０２４４】Ｄ＝Σ｜Ｘ′（ｋ）−Ｙ′（ｋ）｜２ …（８）これにより、正面から信号が到来し、２チャネル間で全
く同じ信号が出力された場合は“０”となり、そうでな
い場合は大きな値となるので、この値の大小から到来方
向が正面か否かを判断できる。 D = Σ | X ′ (k) −Y ′ (k) | 2 (8) As a result, when a signal arrives from the front and the same signal is output between the two channels, “0” is output. Otherwise, a large value is obtained, and it can be determined from the magnitude of this value whether or not the arrival direction is the front.

【０２４５】なお、スペクトル正規化は、上述の例に限
ったものでなく、例えば、２チャネルのスペクトルの絶
対値の和を使っても良いし、２チャネルのスペクトルの
和の絶対値を使っても良く、また、２チャネルのうち、
大きい方の絶対値を使っても良い。また、周波数成分毎
に、２チャネルのうちの周波数成分の合計パワーの大き
い方の絶対値で正規化しても良い。The spectrum normalization is not limited to the above example, and for example, the sum of the absolute values of the spectra of two channels may be used, or the absolute value of the sum of the spectra of two channels may be used. Also, of the two channels,
You may use the larger absolute value. Further, each frequency component may be normalized by the absolute value of the larger total power of the frequency components of the two channels.

【０２４６】この処理を、例えば、８［ｍｓｅｃ］のフ
レーム周期で繰り返すことで、時々刻々変化する到来音
声の方向性指標Ｄを得ることができる。By repeating this processing, for example, in a frame cycle of 8 [msec], it is possible to obtain the directionality index D of the incoming voice which changes from moment to moment.

【０２４７】（本発明雑音抑圧処理装置の具体的構成例
１）図１に示した基本構成例は、上述した如き構成の話
者追尾マイクロホンアレイ１０、スペクトルサブトラク
ション処理部３０、方向検出部４０、スペクトルサブト
ラクション制御部５０を用いて実現した一例としての本
発明の雑音抑圧処理装置である。(Specific Configuration Example 1 of Noise Suppression Processing Apparatus of the Present Invention) The basic configuration example shown in FIG. 1 includes the speaker tracking microphone array 10, the spectrum subtraction processing unit 30, the direction detection unit 40, and the configuration described above. It is a noise suppression processing apparatus of the present invention as an example realized by using the spectrum subtraction control unit 50.

【０２４８】この雑音抑圧処理装置の作用を説明する。
図１に示した本発明による雑音抑圧装置の基本構成は、
話者追尾マイクロホンアレイ１０として図３の構成を、
スペクトルサブトラクション処理部３０として図１０の
構成を、方向検出部４０として図１２もしくは図１４の
構成を採用した場合、図１６に示す如きとなる。The operation of this noise suppression processing device will be described.
The basic configuration of the noise suppressing device according to the present invention shown in FIG.
As the speaker tracking microphone array 10, the configuration of FIG.
When the configuration of FIG. 10 is adopted as the spectrum subtraction processing unit 30 and the configuration of FIG. 12 or FIG. 14 is adopted as the direction detecting unit 40, it becomes as shown in FIG.

【０２４９】このシステムの特徴は、音声入力部として
指向性のある少なくとも２チャネル分のマイクロホンを
互いに軸方向を傾けて配置し、これらのマイクロホンで
得た音声信号をそれぞれチャネル別に周波数分析し、こ
れを所望方向外の感度が低くなるように計算したフィル
タ係数を用いての適応ビームフォーマ処理することで、
話者方向からの音声を抑圧して雑音成分を得、この雑音
成分を抑圧する処理を施して雑音の少ない話者音声成分
を得ると云った雑音抑圧処理装置を用いることにより、
雑音抑制した話者音声成分と雑音成分とを得る（図１７
のステップＳ１３１）ようにすると共に、この雑音抑圧
処理装置に、短時間周波数分析に基づく方向性検出部を
追加し、ビームフォーマ処理で抑圧できない突発性雑音
や高速移動音源等の到来をこの方向性検出部で検出し
（図１７のステップＳ１３２）、この検出結果と雑音抑
圧処理装置にて求めた話者音声成分と雑音成分とを用い
て行うスペクトルサブトラクションを制御する（図１７
のステップＳ１３３，Ｓ１３４）ことにより、話者方向
の許容範囲を高精度に設定できる話者追尾機能を確保し
つつ、しかも、突発性雑音、高速移動音源等を抑圧する
ことを可能としている。This system is characterized in that microphones for at least two channels having directivity are arranged as audio input sections with their axial directions inclined to each other, and audio signals obtained by these microphones are frequency-analyzed for each channel. By performing adaptive beamformer processing using the filter coefficient calculated so that sensitivity outside the desired direction becomes low,
By using a noise suppression processing device that suppresses speech from the speaker direction to obtain a noise component, and performs processing to suppress this noise component to obtain a speaker speech component with less noise,
A speaker voice component and a noise component in which noise is suppressed are obtained (FIG. 17).
Step S131) of the above, and a directionality detection unit based on short-time frequency analysis is added to this noise suppression processing device to prevent the arrival of sudden noise or high-speed moving sound source that cannot be suppressed by the beamformer processing. Spectral subtraction performed by the detection unit (step S132 in FIG. 17) and using the detection result and the speaker voice component and noise component obtained by the noise suppression processing device are controlled (FIG. 17).
By performing steps S133 and S134), it is possible to suppress a sudden noise, a high-speed moving sound source, and the like while securing a speaker tracking function capable of setting the allowable range in the speaker direction with high accuracy.

【０２５０】すなわち、図１６の雑音抑圧処理装置は、
２つのマイクロホンを持つ音声入力部１１から入力され
た音声は、周波数分析部１２に送られ、例えば高速フー
リエ変換（ＦＦＴ）等により周波数成分が計算される。
そして、これら求められた各チャネル別周波数成分のデ
ータは第１及び第２のビームフォーマ１３，１６および
方向性検出部４０に与えられる。That is, the noise suppression processing device of FIG.
The voice input from the voice input unit 11 having two microphones is sent to the frequency analysis unit 12, and the frequency component is calculated by, for example, fast Fourier transform (FFT).
Then, the obtained data of the frequency component for each channel is given to the first and second beam formers 13 and 16 and the directionality detector 40.

【０２５１】第１のビームフォーマ１３では、周波数分
析部１２からの２チャネルの入力に対する周波数成分か
ら、周波数領域の適応フィルタにより雑音を抑圧し、目
的音の方向の周波数成分を出力する（音声周波数成分出
力）。ここでは、目的音の方向をマイクロホンの正面と
するように、目的音方向推定部１８からの出力を用いて
第１の入力方向修正部１４で位相を整える操作を行う。In the first beam former 13, noise is suppressed from the frequency components corresponding to the two channels input from the frequency analysis unit 12 by the adaptive filter in the frequency domain, and the frequency component in the direction of the target sound is output (voice frequency Component output). Here, the operation of adjusting the phase by the first input direction correction unit 14 is performed using the output from the target sound direction estimation unit 18 so that the direction of the target sound is in front of the microphone.

【０２５２】また、第２のビームフォーマ１６では、周
波数分析部１２からの２チャネルの入力に対する周波数
成分から、周波数領域の適応フィルタにより目的音を抑
圧し、雑音の方向の周波数成分を出力する（雑音周波数
成分出力）。ここでは、雑音の方向をマイクロホンの正
面と仮定し、２つのマイクロホンに対して雑音が同時に
到着したと見なせるように、雑音方向推定部１７からの
出力を用いて第２の入力方向修正部１５で位相を整える
操作（整相）を行う。In the second beam former 16, the target sound is suppressed from the frequency components corresponding to the two channels input from the frequency analysis unit 12 by the frequency domain adaptive filter, and the frequency component in the noise direction is output ( Noise frequency component output). Here, assuming that the direction of the noise is in front of the microphones, the second input direction correction unit 15 uses the output from the noise direction estimation unit 17 so that it can be considered that the noises have arrived at the two microphones at the same time. Perform the operation to adjust the phase (phasing).

【０２５３】ここで、雑音方向推定部１７では、第１の
ビームフォーマ１３の適応フィルタから雑音方向を推定
し、目的音方向推定部１８では、第２のビームフォーマ
１６の適応フィルタから目的音方向を推定する。これら
の処理は例えば８［ｍｓｅｃ］等の固定時間毎に行われ
る。Here, the noise direction estimation unit 17 estimates the noise direction from the adaptive filter of the first beam former 13, and the target sound direction estimation unit 18 estimates the noise direction from the adaptive filter of the second beam former 16. To estimate. These processes are performed at fixed time intervals such as 8 [msec].

【０２５４】次に、本発明システムの重要な要素の一つ
である方向検出部４０とスペクトルサブトラクション制
御部５０について説明する。Next, the direction detection section 40 and the spectrum subtraction control section 50, which are one of the important elements of the system of the present invention, will be described.

【０２５５】方向検出部４０では、上述したとおり、短
時間ＦＦＴなどの周波数分析に基づき、２つのマイクロ
ホンの位相差Ｔのみならず、各チャネルの入力信号のパ
ワー比Ｒを用いて、方向性指標Ｄを計算する。そして、
この求めた方向性指標Ｄをスペクトルサブトラクション
制御部５０に与える。As described above, the direction detecting section 40 uses not only the phase difference T between the two microphones but also the power ratio R of the input signal of each channel, based on the frequency analysis such as the short-time FFT, to determine the directionality index. Calculate D. And
The obtained directionality index D is given to the spectrum subtraction control unit 50.

【０２５６】スペクトルサブトラクション制御部５０
は、この求めた方向性指標Ｄと目的音方向推定部１８の
出力である目的音方向情報より目的音方向（話者方向）
とに基づいて、３通りの信号（“０”，“１”，
“２”）のいずれかを発生し、スペクトルサブトラクシ
ョン処理部３０に送る。Spectral subtraction controller 50
Is the target sound direction (speaker direction) based on the obtained directionality index D and the target sound direction information output from the target sound direction estimation unit 18.
Based on and, three kinds of signals (“0”, “1”,
Either "2") is generated and sent to the spectrum subtraction processing unit 30.

【０２５７】ここで、３種類の信号のうち、信号“０”
はほとんど雑音のみの区間であることを表し、信号
“１”は大きな突発性雑音が音声区間に重畳している区
間であることを表し、信号“２”はほぼ音声のみの区間
であることを表している。Here, of the three types of signals, the signal "0"
Indicates that the section is almost only noise, the signal “1” indicates that a large sudden noise is superimposed on the speech section, and the signal “2” indicates that the section is almost only speech. It represents.

【０２５８】これは次のようにして求める。まず、話者
方向の許容範囲を２つのマイクホンの中心から何度、方
向性指標Ｄのしきい値を“いくつ”と設定する。例え
ば、話者方向の許容範囲を２つのマイクホンの中心から
±２０゜、方向性指標Ｄのしきい値を“１．０”と設定
すると云った具合に使用環境対応に最適値を設定する。This is obtained as follows. First, the allowable range in the speaker direction is set from the center of the two microphones and the threshold value of the directionality index D is set to "some". For example, the allowable range in the speaker direction is set to ± 20 ° from the center of the two microphones, and the threshold value of the directionality index D is set to "1.0".

【０２５９】そして、方向性検出部４０から送られてく
る方向性指標Ｄが、しきい値（１．０）以下か否かを判
定し、その結果、しきい値以下であれば、つぎに目的音
方向推定部１８からの出力である目的音方向情報より目
的音方向（話者方向）が設定範囲内かどうかを判定し、
設定範囲内であればそれはほぼ音声のみの区間であるこ
とを意味しているので信号“２”を発生してスペクトル
サブトラクション処理部３０に送る。また、設定範囲外
であれば、ほとんど雑音のみの区間であることを意味し
ているので信号“０”を発生してスペクトルサブトラク
ション処理部３０に送る。Then, it is determined whether or not the directionality index D sent from the directionality detection section 40 is less than or equal to the threshold value (1.0). If the result is less than or equal to the threshold value, then Based on the target sound direction information output from the target sound direction estimation unit 18, it is determined whether the target sound direction (speaker direction) is within the set range,
If it is within the set range, it means that it is a section of almost only voice, so a signal "2" is generated and sent to the spectrum subtraction processing unit 30. Further, if it is outside the set range, it means that it is a section including almost no noise, and therefore a signal “0” is generated and sent to the spectrum subtraction processing unit 30.

【０２６０】方向性指標Ｄがしきい値以上であり、目的
音方向が設定範囲内であれば、それは大きな突発性雑音
が音声区間に重畳している区間であることを意味してお
り、従って、話者方向から到来する音声に突発性の雑音
が重畳していると判定して信号“１”を発生してスペク
トルサブトラクション処理部３０に送り、方向性指標Ｄ
がしきい値以上であり、目的音方向が設定範囲外であれ
ば、それはほとんど雑音のみの区間であることを意味す
るので信号“０”を発生してスペクトルサブトラクショ
ン処理部３０に送る。If the directionality index D is equal to or greater than the threshold value and the target sound direction is within the set range, it means that a large sudden noise is superposed on the voice section, and accordingly, , It is determined that sudden noise is superposed on the voice coming from the speaker direction, a signal "1" is generated and sent to the spectrum subtraction processing unit 30, and the directionality index D
Is greater than or equal to the threshold value and the target sound direction is outside the set range, it means that it is an interval of almost only noise, and therefore a signal “0” is generated and sent to the spectrum subtraction processing unit 30.

【０２６１】このようにして、スペクトルサブトラクシ
ョン制御部５０は、目的音方向推定部１８で推定された
目的音方向（話者方向）と、方向性検出部４０で計算さ
れた方向性指標Ｄに基づいて、３通りの信号（“０”，
“１”，“２”）のいずれかをスペクトルサブトラクシ
ョン処理部３０に送る。In this way, the spectral subtraction control unit 50 is based on the target sound direction (speaker direction) estimated by the target sound direction estimation unit 18 and the directionality index D calculated by the directionality detection unit 40. , Three kinds of signals (“0”,
Either “1” or “2”) is sent to the spectrum subtraction processing unit 30.

【０２６２】スペクトルサブトラクション処理部３０で
は、話者追尾マイクロホンアレイ１０の第１のビームフ
ォーマ１３から与えられる音声周波数成分と第２のビー
ムフォーマ１６（または、第３のビームフォーマ２２）
から与えられる雑音周波数成分とを受け、スペクトルサ
ブトラクション制御部５０から出力されるその３種類の
信号にしたがって所要の雑音圧縮処理を行う。In the spectral subtraction processing unit 30, the voice frequency component given from the first beam former 13 of the speaker tracking microphone array 10 and the second beam former 16 (or the third beam former 22) are included.
And a noise frequency component given by the above, and performs necessary noise compression processing in accordance with the three types of signals output from the spectrum subtraction control unit 50.

【０２６３】スペクトルサブトラクション処理部３０で
は、このスペクトルサブトラクション制御部５０からの
信号を受け取り、信号が“０”の時は、ほとんど雑音区
間と見なせるので、最小重みをかけて、出力信号をカッ
トする。また、スペクトルサブトラクション制御部５０
からの信号が“１”の時は、音声区間に突発性雑音が重
畳していると見なし、第２のビームフォーマ１６からの
出力を雑音成分として扱う２チャネルのスペクトルサブ
トラクション処理を行う。The spectrum subtraction processing unit 30 receives the signal from the spectrum subtraction control unit 50, and when the signal is "0", it can be regarded almost as a noise section. Therefore, the output signal is cut by applying the minimum weight. In addition, the spectrum subtraction control unit 50
When the signal from 1 is “1”, it is considered that sudden noise is superposed on the voice section, and the two-channel spectral subtraction process is performed in which the output from the second beam former 16 is treated as a noise component.

【０２６４】すなわち、スペクトルサブトラクション制
御部５０から受け取った信号が“０”の時は、ほとんど
雑音区間と見なせるので、スペクトルサブトラクション
処理部３０では、制御部３５が帯域重み計算部３３に最
小重みを発生させるように制御指令を与え、これによ
り、帯域重み計算部３３は最小重みを発生させてスペク
トル減算部３４に与えるので、当該スペクトル減算部３
４は音声周波数成分の出力に当該最小重みをかける計算
をしてその結果を出力することで出力信号をカットす
る。That is, when the signal received from the spectrum subtraction control unit 50 is "0", it can be regarded almost as a noise section. Therefore, in the spectrum subtraction processing unit 30, the control unit 35 causes the band weight calculation unit 33 to generate the minimum weight. The band weight calculation unit 33 generates the minimum weight and gives it to the spectrum subtraction unit 34, so that the spectrum subtraction unit 3
Reference numeral 4 cuts the output signal by performing a calculation for applying the minimum weight to the output of the audio frequency component and outputting the result.

【０２６５】また、スペクトルサブトラクション制御部
５０からの信号が“１”の時は、音声区間に突発性雑音
が重畳していると見なすことができるので、この信号を
受けた制御部３５は帯域重み計算部３３に通常の重み係
数を計算して出力させるように制御指令を与え、これに
より、帯域重み計算部３３は音声帯域パワー計算部３１
と雑音帯域パワー計算部３２の出力を元に帯域重み係数
を求める。When the signal from the spectrum subtraction control unit 50 is "1", it can be considered that sudden noise is superposed on the voice section. A control command is given to the calculation unit 33 so as to calculate and output a normal weighting coefficient, whereby the band weight calculation unit 33 causes the voice band power calculation unit 31 to operate.
Based on the output of the noise band power calculator 32, the band weight coefficient is obtained.

【０２６６】そして、帯域重み計算部３３はこの求めた
帯域重み係数をスペクトル減算部３４に与えるので、当
該スペクトル減算部３４は第１のビームフォーマ１３か
らの音声周波数成分の出力に、当該重みをかける計算を
してその結果を出力することで、第２のビームフォーマ
１６からの出力を雑音成分として扱う２チャネルのスペ
クトルサブトラクション処理が行われたことになり、当
該スペクトルサブトラクション処理結果を雑音抑圧処理
済みの音声周波数成分として出力できる。Then, the band weight calculation unit 33 gives the obtained band weight coefficient to the spectrum subtraction unit 34, so that the spectrum subtraction unit 34 assigns the weight to the output of the audio frequency component from the first beam former 13. By performing the multiplication calculation and outputting the result, the two-channel spectral subtraction processing that treats the output from the second beamformer 16 as a noise component is performed, and the spectral subtraction processing result is subjected to the noise suppression processing. It can be output as a sound frequency component that has already been processed.

【０２６７】また、スペクトルサブトラクション制御部
５０からの信号が“２”の時は、音声のみの区間と見な
すことができるので、スペクトルサブトラクション処理
部３０では、この信号を受けた制御部３５は帯域重み計
算部３３に通常の重み係数を計算して出力させるように
制御指令を与え、これにより、帯域重み計算部３３は音
声帯域パワー計算部３１と雑音帯域パワー計算部３２の
出力を元に帯域重み係数を求める。Further, when the signal from the spectrum subtraction control unit 50 is "2", it can be regarded as a voice only section. Therefore, in the spectrum subtraction processing unit 30, the control unit 35 receiving this signal receives the band weight. A control command is given to the calculation unit 33 to calculate and output a normal weighting coefficient, whereby the band weight calculation unit 33 determines the band weight based on the outputs of the voice band power calculation unit 31 and the noise band power calculation unit 32. Find the coefficient.

【０２６８】そして、帯域重み計算部３３はこの求めた
帯域重み係数をスペクトル減算部３４に与えるので、当
該スペクトル減算部３４は第１のビームフォーマ１３か
らの音声周波数成分の出力に、当該重みをかける計算を
してその結果を出力することで、第１のビームフォーマ
１３の出力に対し、１チャネルのスペクトルサブトラク
ションを行った状態の結果を得、これを音声周波数成分
の出力として出力する。Then, the band weight calculation unit 33 gives the obtained band weight coefficient to the spectrum subtraction unit 34, so that the spectrum subtraction unit 34 assigns the weight to the output of the audio frequency component from the first beam former 13. By performing the multiplication calculation and outputting the result, the result of the state in which the 1-channel spectral subtraction is performed on the output of the first beamformer 13 is obtained, and this is output as the output of the audio frequency component.

【０２６９】なお、別の制御方法として、信号“１”の
時は、大きな突発性雑音が音声に重畳しているため、雑
音区間と見なして信号“０”と同様の処理にしてもよ
い。As another control method, when the signal is "1", since a large sudden noise is superposed on the voice, it may be regarded as a noise section and may be processed in the same manner as the signal "0".

【０２７０】本装置により、方向を持つ雑音成分および
方向のない雑音成分を抑圧したひずみの少ない音声成分
のみの抽出ができると共に、突発的な雑音に対しても、
雑音成分を抑圧したひずみの少ない音声成分の抽出がで
きる。This apparatus can extract only a voice component with less distortion by suppressing a noise component having a direction and a noise component having no direction, and also for a sudden noise,
It is possible to extract a voice component with less distortion that suppresses a noise component.

【０２７１】（本発明雑音抑圧処理装置の具体的構成例
２）この具体的構成例は図１８に示す如きであって、こ
の場合、話者追尾マイクロホンアレイ１０としては図７
の構成を用いると共に、スペクトルサブトラクション処
理部３０としては図１０の構成、そして、方向検出部４
０として図１２もしくは図１４の構成を採用した例であ
る。(Specific Configuration Example 2 of Noise Suppression Processing Apparatus of the Present Invention) This specific configuration example is as shown in FIG. 18, and in this case, the speaker tracking microphone array 10 is shown in FIG.
10 is used as the spectrum subtraction processing unit 30, and the direction detection unit 4 is used.
This is an example in which the configuration of FIG. 12 or 14 is adopted as 0.

【０２７２】このような構成の本システムの作用を説明
する。The operation of this system having such a configuration will be described.

【０２７３】まず、複数のマイクロホンを持つ音声入力
部１１、この例では第１及び第２の計２本のマイクロホ
ンｍ１，ｍ２を持つ音声入力部１１でｃｈ１，ｃｈ２の
音声を取り込む。そして、この音声入力部１１から入力
された２チャネル分の音声の信号ｃｈ１，ｃｈ２（すな
わち、第１チャネルｃｈ１は第１のマイクロホンｍ１か
らの音声、第２チャネルｃｈ２は第２のマイクロホンｍ
２からの音声に該当する）は、周波数分析部１２に送ら
れ、ここで例えば高速フーリエ変換（ＦＦＴ）等の処理
を行うことによって、それぞれのチャネル別に周波数成
分（周波数スペクトル）が求められる。First, the voice input unit 11 having a plurality of microphones, in this example, the voice input unit 11 having a total of two microphones m1 and m2, captures the voices of ch1 and ch2. The two channels of audio signals ch1 and ch2 input from the audio input unit 11 (that is, the first channel ch1 is the audio from the first microphone m1 and the second channel ch2 is the second microphone m).
(Corresponding to the voice from 2) is sent to the frequency analysis unit 12, where a frequency component (frequency spectrum) is obtained for each channel by performing processing such as fast Fourier transform (FFT).

【０２７４】周波数分析部１２でそれぞれ求められたチ
ャネル別の周波数成分は、それぞれ第１、第２及び第３
のビームフォーマ１３，１６，２２に与えられる。The frequency components for each channel obtained by the frequency analysis unit 12 are respectively the first, second and third frequency components.
Beam formers 13, 16 and 22.

【０２７５】第１のビームフォーマ１３では、２チャネ
ル分の周波数成分入力について、目的音の方向対応に位
相を合わせた上で、周波数領域の適応フィルタにより上
述のようにして処理することで雑音を抑圧し、目的音の
方向の周波数成分を出力する。In the first beam former 13, the noise components are processed by adjusting the phases of the frequency component inputs for two channels according to the direction of the target sound and then processing the same as described above by the frequency domain adaptive filter. Suppress and output the frequency component in the direction of the target sound.

【０２７６】すなわち、第１の入力方向修正部１４は第
１のビームフォーマ１３に対して次のような角度情報
（α）を与える。つまり、第１の入力方向修正部１４
は、有効雑音決定部２４を介して与えられる音声方向推
定部１８若しくは音声方向推定部２３からの出力を用
い、目的音の方向があたかもマイクロホンの正面方向と
なるよう、上記２チャネルの周波数成分の入力位相を整
えるに必要な角度情報（α）を入力方向修正量として第
１のビームフォーマ１３に対して与える。That is, the first input direction correction section 14 gives the following angle information (α) to the first beam former 13. That is, the first input direction correction unit 14
Uses the output from the voice direction estimation unit 18 or the voice direction estimation unit 23 given via the effective noise determination unit 24 so that the frequency components of the two channels are set so that the direction of the target sound is the front direction of the microphone. The angle information (α) necessary for adjusting the input phase is given to the first beam former 13 as an input direction correction amount.

【０２７７】この結果、第１のビームフォーマ１３はこ
の修正量（α）対応に目的音方向を修正し、当該目的音
方向以外の方向から到来する音声を抑圧させるようにす
ることで、雑音成分を抑圧し、目的音を抽出する。As a result, the first beam former 13 corrects the target sound direction in accordance with the correction amount (α) and suppresses the voice coming from the direction other than the target sound direction, whereby the noise component Is suppressed and the target sound is extracted.

【０２７８】つまり、第２および第３のビームフォーマ
１６，２２の場合、雑音が目的音であるから、雑音に位
相を合わせている。その結果、第２，第３のビームフォ
ーマ１６，２２では話者の音源は雑音源として扱われ、
各ビームフォーマの内蔵する適応フィルタは話者音源か
らの音を抽出する処理をすることになるので、当該第
２，第３のビームフォーマ１６，２２の適応フィルタの
パラメータからは話者音源の方向を反映した情報が得ら
れることになる。That is, in the case of the second and third beam formers 16 and 22, since the noise is the target sound, the phase is matched with the noise. As a result, the sound source of the speaker is treated as a noise source in the second and third beamformers 16 and 22,
Since the adaptive filter built in each beamformer performs the process of extracting the sound from the speaker sound source, the direction of the speaker sound source can be determined from the parameters of the adaptive filters of the second and third beamformers 16 and 22. Information that reflects is obtained.

【０２７９】従って、第１または第２の音声方向推定部
１８または２３により、第２または第３のビームフォー
マ１６または２２における適応フィルタのパラメータを
用いて雑音源方向を知れば、それは目的音である話者音
源の方向を反映させたものである。従って、第１または
第２の音声方向推定部１８または２３により、第２また
は第３のビームフォーマ１６または２２における適応フ
ィルタのパラメータを反映させた出力を出し、第１の入
力方向修正部１４でこの出力対応に入力方向修正量
（α）を発生し、この修正量対応に第１のビームフォー
マ１３における目的音方向を修正すれば、第１のビーム
フォーマ１３は当該目的音方向以外の方向から到来する
音声を抑圧するので、この場合、話者音源からの成分を
抽出できることになる。Therefore, if the noise source direction is known by the first or second voice direction estimation unit 18 or 23 using the parameter of the adaptive filter in the second or third beam former 16 or 22, it is the target sound. It reflects the direction of a speaker's sound source. Therefore, the first or second voice direction estimation unit 18 or 23 outputs an output reflecting the parameter of the adaptive filter in the second or third beam former 16 or 22, and the first input direction correction unit 14 outputs the output. If an input direction correction amount (α) is generated in response to this output and the target sound direction in the first beam former 13 is corrected in accordance with this correction amount, the first beam former 13 will move from a direction other than the target sound direction. Since the incoming voice is suppressed, in this case, the component from the speaker sound source can be extracted.

【０２８０】一方、第１のビームフォーマ１３の適応フ
ィルタでは雑音成分が抽出されるようにパラメータが制
御されているので、このパラメータから雑音方向推定部
１７では、雑音方向を推定し、その情報を第２及び第３
の入力方向修正部１５，２１と有効雑音決定部２４に与
えることになる。On the other hand, since the parameters are controlled by the adaptive filter of the first beam former 13 so that the noise component is extracted, the noise direction estimation unit 17 estimates the noise direction from this parameter and outputs the information. Second and third
To the effective noise determining unit 24 and the input direction correcting units 15 and 21.

【０２８１】そして、当該雑音方向推定部１７からの出
力を受けた第２の入力方向修正部１５では、当該雑音方
向推定部１７からの出力対応に入力方向修正量（α）を
発生し、この修正量対応に第２のビームフォーマ１６に
おける目的音方向を修正すれば、第２のビームフォーマ
１６は当該目的音方向以外の方向から到来する音声を抑
圧するので、この場合、話者音源以外からの成分である
雑音成分を抽出できることになる。Then, the second input direction correction unit 15 receiving the output from the noise direction estimation unit 17 generates an input direction correction amount (α) corresponding to the output from the noise direction estimation unit 17, and If the target sound direction in the second beamformer 16 is corrected according to the correction amount, the second beamformer 16 suppresses the voice coming from directions other than the target sound direction. Therefore, the noise component that is the component of can be extracted.

【０２８２】このとき、第２のビームフォーマ１６の適
応フィルタでは目的音である話者音声成分が抽出される
ようにパラメータが制御されているので、このパラメー
タから第１の音声方向推定部１８では、話者音声方向を
推定することができる。そして、第１の音声方向推定部
１８はその推定した情報を有効雑音決定部２４に与え
る。At this time, since the parameters are controlled by the adaptive filter of the second beam former 16 so that the speaker voice component which is the target sound is extracted, the first voice direction estimation unit 18 uses this parameter from the parameters. , The speaker voice direction can be estimated. Then, the first voice direction estimating unit 18 gives the estimated information to the effective noise determining unit 24.

【０２８３】また、雑音方向推定部１７からの出力が第
３の入力方向修正部２１にも与えられているが、これを
受けた第３の入力方向修正部２１では、当該雑音方向推
定部１７からの出力対応に入力方向修正量（α）を発生
に、第３のビームフォーマ２２に与える。これにより、
第３のビームフォーマ２２はこの与えられた修正量対応
に、自己における目的音方向を修正する。The output from the noise direction estimation unit 17 is also given to the third input direction correction unit 21, and the third input direction correction unit 21 that has received this output has the noise direction estimation unit 17 concerned. The input direction correction amount (α) is generated in response to the output from and is given to the third beam former 22. This allows
The third beam former 22 corrects its own target sound direction in accordance with the given correction amount.

【０２８４】これにより、第３のビームフォーマ２２は
当該目的音方向以外の方向から到来する音声を抑圧する
ので、この場合、話者音源以外からの成分、つまり、雑
音成分を抽出できることになる。As a result, the third beam former 22 suppresses the voice coming from a direction other than the target sound direction, and in this case, it is possible to extract a component other than the speaker sound source, that is, a noise component.

【０２８５】このとき、第３のビームフォーマ２２の適
応フィルタでは目的音である話者音声成分が抽出される
ようにパラメータが制御されているので、このパラメー
タから第２の音声方向推定部２３では、話者音声方向を
推定できる。そして、この推定した情報は有効雑音決定
部２４に与えることになる。At this time, since the parameters are controlled by the adaptive filter of the third beam former 22 so as to extract the speaker voice component that is the target sound, the second voice direction estimation unit 23 uses the parameters from this parameter. , The speaker voice direction can be estimated. Then, this estimated information is given to the effective noise determination unit 24.

【０２８６】有効雑音決定部２４では、第１および第２
の音声方向推定部１８，２３から与えられた話者音声方
向の推定情報と、雑音方向推定部１７から与えられた雑
音方向の推定情報とをもとに、第２のビームフォーマ１
６と第３のビームフォーマ２２のいずれが雑音を有効に
追尾しているかを判断する。そして、この判断結果に基
づき、有効に追尾していると判断した方のビームフォー
マにおける適応フィルタのパラメータを第１の入力方向
修正部１４に与える。In the effective noise determining section 24, the first and second
Of the second beamformer 1 based on the speaker voice direction estimation information given from the voice direction estimation units 18 and 23 and the noise direction estimation information given from the noise direction estimation unit 17.
It is determined which of the sixth beamformer 22 and the third beamformer 22 is effectively tracking noise. Then, based on the result of this determination, the parameter of the adaptive filter in the beamformer that is determined to be effectively tracking is given to the first input direction correction unit 14.

【０２８７】そのため、第１の入力方向修正部１４で
は、当該パラメータを反映させた出力を出し、第１の入
力方向修正部１４でこの出力対応に入力方向修正量
（α）を発生し、この修正量対応に第１のビームフォー
マ１３における目的音方向を修正するので、第１のビー
ムフォーマ１３は当該目的音方向以外の方向から到来す
る音声を抑圧することになって、この場合、話者音源か
らの成分を抽出でき、しかも、広く移動する雑音源から
の雑音を対象とする場合に、その移動する雑音源を見失
うことなく、確実にとらえて雑音除去することが可能と
なる。Therefore, the first input direction correction unit 14 outputs an output reflecting the parameter, and the first input direction correction unit 14 generates the input direction correction amount (α) corresponding to this output. Since the target sound direction in the first beam former 13 is corrected according to the correction amount, the first beam former 13 is supposed to suppress the voice coming from a direction other than the target sound direction. In this case, the speaker It is possible to extract a component from a sound source, and when noise from a widely moving noise source is targeted, it is possible to reliably capture and remove the noise without losing the moving noise source.

【０２８８】ただし、突発的な雑音には対処できないの
で、ここでは、サブトラクション処理部３０、方向性検
出部４０、サブトラクション制御部５０からなる構成に
より対処できるようにしている。However, since sudden noise cannot be dealt with, here, the configuration including the subtraction processing unit 30, the directionality detection unit 40, and the subtraction control unit 50 can be used.

【０２８９】すなわち、短時間周波数分析に基づく方向
性検出部を追加し、ビームフォーマ処理で抑圧できない
突発性雑音や高速移動音源等の到来をこの方向性検出部
４０で検出し、この検出結果と雑音抑圧処理装置にて求
めた話者音声成分と雑音成分とを用いて行うスペクトル
サブトラクション処理を制御することにより、話者方向
の許容範囲を高精度に設定できる話者追尾機能を確保し
つつ、しかも、突発性雑音、高速移動音源等を抑圧する
ことを可能としている。That is, a directionality detector based on short-time frequency analysis is added, and the directionality detector 40 detects the arrival of sudden noise or high-speed moving sound source that cannot be suppressed by the beamformer processing. By controlling the spectral subtraction process performed using the speaker voice component and the noise component obtained by the noise suppression processing device, while securing the speaker tracking function that can set the allowable range in the speaker direction with high accuracy, Moreover, it is possible to suppress sudden noise, high-speed moving sound source, and the like.

【０２９０】すなわち、方向検出部４０では、上述した
とおり、短時間ＦＦＴなどの周波数分析に基づき、２つ
のマイクロホンの位相差Ｔのみならず、各チャネルの入
力信号のパワー比Ｒを用いて、方向性指標Ｄを計算す
る。そして、この求めた方向性指標Ｄをスペクトルサブ
トラクション制御部５０に与える。That is, as described above, the direction detecting section 40 uses not only the phase difference T between the two microphones but also the power ratio R of the input signal of each channel based on the frequency analysis such as the short-time FFT. Calculate the sex index D. Then, the obtained directionality index D is given to the spectrum subtraction control unit 50.

【０２９１】スペクトルサブトラクション制御部５０
は、この求めた方向性指標Ｄと目的音方向推定部１８の
出力である目的音方向情報より目的音方向（話者方向）
とに基づいて、３通りの信号（“０”，“１”，
“２”）のいずれかを発生し、スペクトルサブトラクシ
ョン処理部３０に送る。Spectral subtraction controller 50
Is the target sound direction (speaker direction) based on the obtained directionality index D and the target sound direction information output from the target sound direction estimation unit 18.
Based on and, three kinds of signals (“0”, “1”,
Either "2") is generated and sent to the spectrum subtraction processing unit 30.

【０２９２】ここで、３種類の信号のうち、信号“０”
はほとんど雑音のみの区間であることを表し、信号
“１”は大きな突発性雑音が音声区間に重畳している区
間であることを表し、信号“２”はほぼ音声のみの区間
であることを表している。Here, of the three types of signals, the signal "0"
Indicates that the section is almost only noise, the signal “1” indicates that a large sudden noise is superimposed on the speech section, and the signal “2” indicates that the section is almost only speech. It represents.

【０２９３】これは次のようにして求める。まず、話者
方向の許容範囲を２つのマイクホンの中心から何度、方
向性指標Ｄのしきい値を“いくつ”と設定する。例え
ば、話者方向の許容範囲を２つのマイクホンの中心から
±２０゜、方向性指標Ｄのしきい値を“１．０”と設定
すると云った具合に使用環境対応に最適値を設定する。This is obtained as follows. First, the allowable range in the speaker direction is set from the center of the two microphones and the threshold value of the directionality index D is set to "some". For example, the allowable range in the speaker direction is set to ± 20 ° from the center of the two microphones, and the threshold value of the directionality index D is set to "1.0".

【０２９４】そして、方向性検出部４０から送られてく
る方向性指標Ｄが、しきい値（１．０）以下か否かを判
定し、その結果、しきい値以下であれば、つぎに目的音
方向推定部１８からの出力である目的音方向情報より目
的音方向（話者方向）が設定範囲内かどうかを判定し、
設定範囲内であればそれはほぼ音声のみの区間であるこ
とを意味しているので信号“２”を発生してスペクトル
サブトラクション処理部３０に送る。また、設定範囲外
であれば、ほとんど雑音のみの区間であることを意味し
ているので信号“０”を発生してスペクトルサブトラク
ション処理部３０に送る。Then, it is determined whether or not the directionality index D sent from the directionality detecting section 40 is less than or equal to the threshold value (1.0). If the result is less than or equal to the threshold value, then Based on the target sound direction information output from the target sound direction estimation unit 18, it is determined whether the target sound direction (speaker direction) is within the set range,
If it is within the set range, it means that it is a section of almost only voice, so a signal "2" is generated and sent to the spectrum subtraction processing unit 30. Further, if it is outside the set range, it means that it is a section including almost no noise, and therefore a signal “0” is generated and sent to the spectrum subtraction processing unit 30.

【０２９５】方向性指標Ｄがしきい値以上であり、目的
音方向が設定範囲内であれば、それは大きな突発性雑音
が音声区間に重畳している区間であることを意味してお
り、従って、話者方向から到来する音声に突発性の雑音
が重畳していると判定して信号“１”を発生してスペク
トルサブトラクション処理部３０に送り、方向性指標Ｄ
がしきい値以上であり、目的音方向が設定範囲外であれ
ば、それはほとんど雑音のみの区間であることを意味す
るので信号“０”を発生してスペクトルサブトラクショ
ン処理部３０に送る。If the directionality index D is equal to or more than the threshold value and the target sound direction is within the set range, it means that a large sudden noise is superposed on the voice section, and accordingly, , It is determined that sudden noise is superposed on the voice coming from the speaker direction, a signal "1" is generated and sent to the spectrum subtraction processing unit 30, and the directionality index D
Is greater than or equal to the threshold value and the target sound direction is outside the set range, it means that it is an interval of almost only noise, and therefore a signal “0” is generated and sent to the spectrum subtraction processing unit 30.

【０２９６】このようにして、スペクトルサブトラクシ
ョン制御部５０は、目的音方向推定部１８で推定された
目的音方向（話者方向）と、方向性検出部４０で計算さ
れた方向性指標Ｄに基づいて、３通りの信号（“０”，
“１”，“２”）のいずれかをスペクトルサブトラクシ
ョン処理部３０に送る。In this way, the spectral subtraction control unit 50 is based on the target sound direction (speaker direction) estimated by the target sound direction estimation unit 18 and the directionality index D calculated by the directionality detection unit 40. , Three kinds of signals (“0”,
Either “1” or “2”) is sent to the spectrum subtraction processing unit 30.

【０２９７】スペクトルサブトラクション処理部３０で
は、話者追尾マイクロホンアレイ１０の第１のビームフ
ォーマ１３から与えられる音声周波数成分と第２のビー
ムフォーマ１６（または、第３のビームフォーマ２２）
から与えられる雑音周波数成分とを受け、スペクトルサ
ブトラクション制御部５０から出力されるその３種類の
信号にしたがって所要の雑音圧縮処理を行う。In the spectral subtraction processing unit 30, the voice frequency component given from the first beam former 13 of the speaker tracking microphone array 10 and the second beam former 16 (or the third beam former 22).
And a noise frequency component given by the above, and performs necessary noise compression processing in accordance with the three types of signals output from the spectrum subtraction control unit 50.

【０２９８】スペクトルサブトラクション処理部３０で
は、このスペクトルサブトラクション制御部５０からの
信号を受け取り、信号が“０”の時は、ほとんど雑音区
間と見なせるので、最小重みをかけて、出力信号をカッ
トする。また、スペクトルサブトラクション制御部５０
からの信号が“１”の時は、音声区間に突発性雑音が重
畳していると見なし、第２のビームフォーマ１６からの
出力を雑音成分として扱う２チャネルのスペクトルサブ
トラクション処理を行う。The spectrum subtraction processing unit 30 receives the signal from the spectrum subtraction control unit 50, and when the signal is "0", it can be regarded almost as a noise section. Therefore, the output signal is cut by applying the minimum weight. In addition, the spectrum subtraction control unit 50
When the signal from 1 is “1”, it is considered that sudden noise is superposed on the voice section, and the two-channel spectral subtraction process is performed in which the output from the second beam former 16 is treated as a noise component.

【０２９９】すなわち、スペクトルサブトラクション制
御部５０から受け取った信号が“０”の時は、ほとんど
雑音区間と見なせるので、スペクトルサブトラクション
処理部３０では、制御部３５が帯域重み計算部３３に最
小重みを発生させるように制御指令を与え、これによ
り、帯域重み計算部３３は最小重みを発生させてスペク
トル減算部３４に与えるので、当該スペクトル減算部３
４は音声周波数成分の出力に当該最小重みをかける計算
をしてその結果を出力することで出力信号をカットす
る。That is, when the signal received from the spectrum subtraction control unit 50 is “0”, it can be regarded almost as a noise section. Therefore, in the spectrum subtraction processing unit 30, the control unit 35 causes the band weight calculation unit 33 to generate the minimum weight. The band weight calculation unit 33 generates the minimum weight and gives it to the spectrum subtraction unit 34, so that the spectrum subtraction unit 3
Reference numeral 4 cuts the output signal by performing a calculation for applying the minimum weight to the output of the audio frequency component and outputting the result.

【０３００】また、スペクトルサブトラクション制御部
５０からの信号が“１”の時は、音声区間に突発性雑音
が重畳していると見なすことができるので、この信号を
受けた制御部３５は帯域重み計算部３３に通常の重み係
数を計算して出力させるように制御指令を与え、これに
より、帯域重み計算部３３は音声帯域パワー計算部３１
と雑音帯域パワー計算部３２の出力を元に帯域重み係数
を求める。Further, when the signal from the spectrum subtraction control unit 50 is "1", it can be considered that the sudden noise is superposed on the voice section, so that the control unit 35 receiving this signal receives the band weight. A control command is given to the calculation unit 33 so as to calculate and output a normal weighting coefficient, whereby the band weight calculation unit 33 causes the voice band power calculation unit 31 to operate.
Based on the output of the noise band power calculator 32, the band weight coefficient is obtained.

【０３０１】そして、帯域重み計算部３３はこの求めた
帯域重み係数をスペクトル減算部３４に与えるので、当
該スペクトル減算部３４は第１のビームフォーマ１３か
らの音声周波数成分の出力に、当該重みをかける計算を
してその結果を出力することで、第２のビームフォーマ
１６からの出力を雑音成分として扱う２チャネルのスペ
クトルサブトラクション処理が行われたことになり、当
該スペクトルサブトラクション処理結果を雑音抑圧処理
済みの音声周波数成分として出力できる。Then, the band weight calculation unit 33 gives the obtained band weight coefficient to the spectrum subtraction unit 34, so that the spectrum subtraction unit 34 assigns the weight to the output of the audio frequency component from the first beam former 13. By performing the multiplication calculation and outputting the result, the two-channel spectral subtraction processing that treats the output from the second beamformer 16 as a noise component is performed, and the spectral subtraction processing result is subjected to the noise suppression processing. It can be output as a sound frequency component that has already been processed.

【０３０２】また、スペクトルサブトラクション制御部
５０からの信号が“２”の時は、音声のみの区間と見な
すことができるので、スペクトルサブトラクション処理
部３０では、この信号を受けた制御部３５は帯域重み計
算部３３に通常の重み係数を計算して出力させるように
制御指令を与え、これにより、帯域重み計算部３３は音
声帯域パワー計算部３１と雑音帯域パワー計算部３２の
出力を元に帯域重み係数を求める。Further, when the signal from the spectrum subtraction control section 50 is "2", it can be regarded as a voice-only section. Therefore, in the spectrum subtraction processing section 30, the control section 35 receiving this signal receives the band weight. A control command is given to the calculation unit 33 to calculate and output a normal weighting coefficient, whereby the band weight calculation unit 33 determines the band weight based on the outputs of the voice band power calculation unit 31 and the noise band power calculation unit 32. Find the coefficient.

【０３０３】そして、帯域重み計算部３３はこの求めた
帯域重み係数をスペクトル減算部３４に与えるので、当
該スペクトル減算部３４は第１のビームフォーマ１３か
らの音声周波数成分の出力に、当該重みをかける計算を
してその結果を出力することで、第１のビームフォーマ
１３の出力に対し、１チャネルのスペクトルサブトラク
ションを行った状態の結果を得、これを音声周波数成分
の出力として出力する。Then, the band weight calculation unit 33 gives the obtained band weight coefficient to the spectrum subtraction unit 34, so that the spectrum subtraction unit 34 applies the weight to the output of the audio frequency component from the first beam former 13. By performing the multiplication calculation and outputting the result, the result of the state in which the 1-channel spectral subtraction is performed on the output of the first beamformer 13 is obtained, and this is output as the output of the audio frequency component.

【０３０４】なお、上述同様に別の制御方法として、信
号“１”の時は、大きな突発性雑音が音声に重畳してい
るため、雑音区間と見なして信号“０”と同様の処理に
してもよい。As another control method similar to the above, when the signal is "1", a large sudden noise is superposed on the voice, so that it is regarded as a noise section and the same processing as the signal "0" is performed. Good.

【０３０５】本装置により、方向を持つ雑音成分および
方向のない雑音成分を抑圧したひずみの少ない音声成分
のみの抽出ができると共に、突発的な雑音に対しても、
雑音成分を抑圧したひずみの少ない音声成分の抽出がで
きる。With this apparatus, it is possible to extract only a voice component with little distortion in which a noise component having a direction and a noise component having no direction are suppressed, and also for a sudden noise,
It is possible to extract a voice component with less distortion that suppresses a noise component.

【０３０６】次に、具体的構成例１を更に高精度化する
ことができるようにした雑音抑圧処理装置の例を具体的
構成例２として次に説明する。Next, an example of a noise suppression processing device capable of further improving the accuracy of the specific configuration example 1 will be described below as a specific configuration example 2.

【０３０７】（スペクトルサブトラクション（ＳＳ）処
理の別の例）この具体的構成例は図１８に示す如きであ
って、この場合、話者追尾マイクロホンアレイ１０とし
ては図７の構成を用いると共に、スペクトルサブトラク
ション処理部３０としては図１９に示す如きの構成を適
用する。(Another Example of Spectral Subtraction (SS) Processing) This specific configuration example is as shown in FIG. 18. In this case, the configuration of FIG. 7 is used as the speaker tracking microphone array 10, and As the subtraction processing unit 30, the configuration shown in FIG. 19 is applied.

【０３０８】本実施例は、スペクトルサブトラクション
（ＳＳ）処理において、雑音成分のパワーを修正するこ
とにより、さらに高精度に雑音抑圧を行うことを可能と
するものである。すなわち、上述した例では雑音源のパ
ワーＮが小さいという仮定をおいている。そのため、ス
ペクトルサブトラクション（ＳＳ）処理を行うと雑音源
の成分が音声に重畳している部分では歪みが大きくなる
懸念が残る。In this embodiment, in the spectral subtraction (SS) process, the power of the noise component is corrected to enable the noise suppression with higher accuracy. That is, in the above example, it is assumed that the noise source power N is small. Therefore, if spectral subtraction (SS) processing is performed, there is a concern that distortion will increase in the portion where the noise source component is superimposed on the voice.

【０３０９】そこで、ここではスペクトルサブトラクシ
ョン処理における帯域重みの計算を、入力信号（周波数
分析部１２の出力）のパワーを用いて修正するようにす
る。Therefore, here, the calculation of the band weight in the spectral subtraction process is corrected by using the power of the input signal (output of the frequency analysis unit 12).

【０３１０】まず、音声出力パワーをＰv、音声成分の
パワーをＶ、音声出力に含まれる背景雑音パワーを
Ｂ′、雑音出力パワーをＰn、雑音源成分のパワーを
Ｎ、雑音出力に含まれる背景雑音成分をＢ″、どの信号
も抑圧されていない入力信号のパワーをＰxとすると、Ｐx＝Ｖ＋Ｎ＋ＢＰv＝Ｖ＋Ｂ′ Ｐn＝Ｎ＋Ｂ″ ここで、ここで、Ｂ〜Ｂ′ 〜Ｂ″と仮定する
と、真の背景雑音成分のパワーＰbは、Ｐb＝Ｐv＋Ｐn−Ｐx ＝Ｖ＋Ｂ′＋Ｎ＋Ｂ″−（Ｖ＋Ｎ＋Ｂ）＝Ｂ′＋Ｂ″−Ｂ＝Ｂとなる。この雑音パワーを用いたスペクトルサブトラク
ション（ＳＳ）の重みは、Ｗ＝（Ｐv−Ｐb）／Ｐｖ＝（Ｐx−Ｐn）／Ｐv と計算でき、背景雑音が非定常でかつ、Ｎが大きい場合
でも歪みの少いＳＳ処理を行うことができる。First, the voice output power is Pv, the voice component power is V, the background noise power included in the voice output is B ′, the noise output power is Pn, the noise source component power is N, and the background included in the noise output. a noise component B ", and how the signal is also the power of the input signal without suppression and Px, Px = V + N + B Pv = V + B 'Pn = N + B" where where, B ~ B' ~ B "assuming, The power Pb of the true background noise component is Pb = Pv + Pn-Px = V + B '+ N + B "-(V + N + B) = B' + B" -B = B. The weight of the spectral subtraction (SS) using this noise power is , W = (Pv-Pb) / Pv = (Px-Pn) / Pv, and SS processing with little distortion can be performed even when background noise is non-stationary and N is large.

【０３１１】本実施例で使用するスペクトルサブトラク
ション処理部３０の構成例を図１９に示し、処理の流れ
を図２０に示す。図１９中、３１は音声帯域パワー計算
部、３２は雑音帯域パワー計算部、３４はスペクトル減
算部、３５は制御部、３７は入力信号帯域パワー計算部
である。FIG. 19 shows a configuration example of the spectrum subtraction processing unit 30 used in this embodiment, and FIG. 20 shows a processing flow. In FIG. 19, 31 is a voice band power calculation unit, 32 is a noise band power calculation unit, 34 is a spectrum subtraction unit, 35 is a control unit, and 37 is an input signal band power calculation unit.

【０３１２】これらのうち、音声帯域パワー計算部３１
は、前記第１のビームフォーマ１３により得られた音声
周波数を、周波数帯域毎に分割して帯域毎の音声パワー
を計算すると共に、この計算されたパワー値を時間方向
に平均化し、帯域毎に平均パワー（音声の平均帯域パワ
ーＰv（k））を求めて、帯域重み計算部３３に与えるも
のであり、雑音帯域パワー計算部３２は、前記第１のビ
ームフォーマ１６または（第２のビームフォーマ２２）
により得られ、有効雑音決定部２４により選択されて出
力された雑音周波数成分を用いて雑音帯域パワーを計算
すると共に、この計算されたパワー値を時間方向に平均
化し、帯域毎に平均パワー（雑音の平均帯域パワーＰn
（k））を求めて、帯域重み計算部３３に与えるもので
ある。Of these, the voice band power calculation unit 31
Divides the audio frequency obtained by the first beamformer 13 into frequency bands to calculate the audio power for each band, averages the calculated power values in the time direction, and The average power (average band power Pv (k) of voice) is obtained and given to the band weight calculation unit 33. The noise band power calculation unit 32 uses the first beam former 16 or the (second beam former). 22)
The noise band power is calculated using the noise frequency component obtained by the effective noise determination unit 24 and output, and the calculated power values are averaged in the time direction to obtain the average power (noise) for each band. Average band power of Pn
(K)) is obtained and given to the band weight calculation unit 33.

【０３１３】また、入力信号帯域パワー計算部３７は、
前記周波数分析部１２から得られた入力信号（ｃｈ１ま
たはｃｈ２いずれか一方）の周波数スペクトル成分を周
波数帯域毎に分割し、帯域毎の入力パワーを計算すると
共に、この計算されたパワー値を時間方向に平均化し、
帯域毎に平均パワー（入力信号の平均帯域パワーＰ（
k））を求めて、帯域重み計算部３３に与えるものであ
り、帯域重み計算部３３に与えるものである。Further, the input signal band power calculation unit 37 is
The frequency spectrum component of the input signal (either ch1 or ch2) obtained from the frequency analysis unit 12 is divided for each frequency band, and the input power for each band is calculated, and the calculated power value is calculated in the time direction. Averaged to
Average power for each band (average band power P (
k)) is obtained and given to the band weight calculation unit 33, and is given to the band weight calculation unit 33.

【０３１４】また、帯域重み計算部３３は、帯域ｋ毎
に、得られた音声の平均帯域パワーＰv（k）と雑音の平
均帯域パワーＰn（k）と前記入力信号帯域パワー計算部
３７にて計算された平均入力帯域パワーＰ（ k）とに基
き、帯域毎の帯域重み係数Ｗ（k）を計算するものであ
り、スペクトル減算部３４は、前記帯域重み計算部３３
の求めた帯域毎の帯域重み係数Ｗ（k）を用い、音声信
号の周波数帯域ごとに当該重み係数をかけて背景雑音を
抑圧するものである。In addition, the band weight calculation unit 33 calculates the average band power Pv (k) of voice and the average band power Pn (k) of noise obtained by the input signal band power calculation unit 37 for each band k. The band weighting coefficient W (k) for each band is calculated based on the calculated average input band power P (k). The spectrum subtraction unit 34 includes the band weight calculation unit 33.
The background weight is suppressed by using the band weighting coefficient W (k) for each band obtained by the above, and applying the weighting coefficient for each frequency band of the audio signal.

【０３１５】また、制御部３５は、スペクトルサブトラ
クション制御部５０からの信号を受け、その信号種別対
応に、帯域重み計算部３３を制御するものであって、ス
ペクトルサブトラクション制御部５０からの信号が
“０”の時は、帯域重み計算部３３に最小重みを発生さ
せるように制御指令を与えて帯域重み計算部３３の出力
が最小重みとなるように制御し、また、スペクトルサブ
トラクション制御部５０からの信号が“１”および
“２”の時は、制御部３５は通常の重み係数を求めるよ
うに帯域重み計算部３３を制御するものである。The control unit 35 receives the signal from the spectrum subtraction control unit 50 and controls the band weight calculation unit 33 according to the signal type, and the signal from the spectrum subtraction control unit 50 is " When it is “0”, a control command is given to the band weight calculation unit 33 so as to generate the minimum weight, and the output of the band weight calculation unit 33 is controlled to have the minimum weight. When the signals are "1" and "2", the control unit 35 controls the band weight calculation unit 33 so as to obtain a normal weight coefficient.

【０３１６】スペクトル減算部３４は帯域重み計算部３
３の出力する重み係数を用い、これを第２のビームフォ
ーマ１６からの出力に乗算して雑音成分を抑圧した音声
周波数成分の信号として出力するためのものである。The spectrum subtraction unit 34 is the band weight calculation unit 3
3 is used to multiply the output from the second beamformer 16 by the weighting coefficient and output the result as a signal of an audio frequency component in which a noise component is suppressed.

【０３１７】図１９に示すスペクトルサブトラクション
（ＳＳ）処理部３０の構成が図１０のスペクトルサブト
ラクション（ＳＳ）処理部３０の構成と異なる点は、何
も抑圧されていない入力信号の周波数成分（周波数分析
部１２の出力）を更に用いる点である。The structure of the spectrum subtraction (SS) processing unit 30 shown in FIG. 19 is different from the structure of the spectrum subtraction (SS) processing unit 30 of FIG. 10 in that no frequency component of the input signal (frequency analysis) is suppressed. The output of the unit 12) is further used.

【０３１８】周波数分析部１２の出力はマイクロホンが
２本あるので、ｃｈ１，ｃｈ２の２系統あるが、いずれ
を用いてもよい。この周波数分析部１２からの入力信号
周波数成分について、入力信号帯域パワー計算部３７で
は、ビームフォーマからの音声周波数成分あるいは雑音
周波数成分と同様に、帯域毎にパワーを計算する（ステ
ップＳ６１）。Since there are two microphones for the output of the frequency analysis unit 12, there are two systems of ch1 and ch2, but either one may be used. With respect to the input signal frequency component from the frequency analysis unit 12, the input signal band power calculation unit 37 calculates the power for each band similarly to the voice frequency component or the noise frequency component from the beamformer (step S61).

【０３１９】また、図１０と同様に、第１のビームフォ
ーマ１３からの出力として音声周波数成分が、そして、
第２のビームフォーマ１５（または第３のビームフォー
マ２２）からの出力として雑音周波数成分が与えられる
ので、音声帯域パワー計算部３１では第１のビームフォ
ーマ１３からの出力である音声周波数成分を用いて音声
帯域パワー計算を実施し（ステップＳ６２）、雑音帯域
パワー計算部３２では第２のビームフォーマ１５（また
は第３のビームフォーマ２２）からの出力である雑音周
波数成分を用いて雑音帯域パワー計算を実施する（ステ
ップＳ６３）。As in the case of FIG. 10, the audio frequency component is output from the first beamformer 13, and
Since the noise frequency component is given as the output from the second beamformer 15 (or the third beamformer 22), the voice band power calculation unit 31 uses the voice frequency component output from the first beamformer 13. Voice band power calculation is performed (step S62), and the noise band power calculation unit 32 calculates the noise band power using the noise frequency component output from the second beam former 15 (or the third beam former 22). Is carried out (step S63).

【０３２０】そして、これらを用いて上述したように帯
域重み計算部３３により重み係数を求める（ステップＳ
６４）。Then, using these, the band weight calculating section 33 obtains the weight coefficient as described above (step S
64).

【０３２１】そして、スペクトル減算部３４はこの求め
られた重み係数を用いて第１のビームフォーマ１３から
の音声周波数成分の出力に、当該重みをかける計算を
し、その結果を出力することでスペクトルサブトラクシ
ョン処理した音声周波数成分を出力する（ステップＳ６
５）。Then, the spectrum subtraction unit 34 uses the obtained weighting coefficient to perform a calculation for multiplying the output of the audio frequency component from the first beamformer 13 by the weighting, and outputs the result to obtain the spectrum. The subtraction processed voice frequency component is output (step S6).
5).

【０３２２】帯域重み計算部３３による重み係数の計算
は、方向検出部４０とスペクトルサブトラクション制御
部５０にて前述同様の制御のもとに実施される。すなわ
ち、方向検出部４０では、上述したとおり、短時間ＦＦ
Ｔなどの周波数分析に基づき、２つのマイクロホンの位
相差Ｔのみならず、各チャネルの入力信号のパワー比Ｒ
を用いて、方向性指標Ｄを計算する。そして、この求め
た方向性指標Ｄをスペクトルサブトラクション制御部５
０に与える。The calculation of the weighting coefficient by the band weight calculation unit 33 is carried out by the direction detection unit 40 and the spectrum subtraction control unit 50 under the same control as described above. That is, in the direction detection unit 40, as described above, the short-time FF
Based on the frequency analysis of T etc., not only the phase difference T of the two microphones but also the power ratio R of the input signal of each channel
Is used to calculate the directionality index D. Then, the obtained directionality index D is used as the spectrum subtraction control unit 5
Give to 0.

【０３２３】スペクトルサブトラクション制御部５０
は、この求めた方向性指標Ｄと目的音方向推定部１８の
出力である目的音方向情報より目的音方向（話者方向）
とに基づいて、３通りの信号（“０”，“１”，
“２”）のいずれかを発生し、スペクトルサブトラクシ
ョン処理部３０に送る。Spectral subtraction controller 50
Is the target sound direction (speaker direction) based on the obtained directionality index D and the target sound direction information output from the target sound direction estimation unit 18.
Based on and, three kinds of signals (“0”, “1”,
Either "2") is generated and sent to the spectrum subtraction processing unit 30.

【０３２４】ここで、３種類の信号のうち、信号“０”
はほとんど雑音のみの区間であることを表し、信号
“１”は大きな突発性雑音が音声区間に重畳している区
間であることを表し、信号“２”はほぼ音声のみの区間
であることを表している。Here, of the three types of signals, the signal "0"
Indicates that the section is almost only noise, the signal “1” indicates that a large sudden noise is superimposed on the speech section, and the signal “2” indicates that the section is almost only speech. It represents.

【０３２５】これは次のようにして求める。まず、話者
方向の許容範囲を２つのマイクホンの中心から何度、方
向性指標Ｄのしきい値を“いくつ”と設定する。例え
ば、話者方向の許容範囲を２つのマイクホンの中心から
±２０゜、方向性指標Ｄのしきい値を“１．０”と設定
すると云った具合に使用環境対応に最適値を設定する。This is obtained as follows. First, the allowable range in the speaker direction is set from the center of the two microphones and the threshold value of the directionality index D is set to "some". For example, the allowable range in the speaker direction is set to ± 20 ° from the center of the two microphones, and the threshold value of the directionality index D is set to "1.0".

【０３２６】そして、方向性検出部４０から送られてく
る方向性指標Ｄが、しきい値（１．０）以下か否かを判
定し、その結果、しきい値以下であれば、つぎに目的音
方向推定部１８からの出力である目的音方向情報より目
的音方向（話者方向）が設定範囲内かどうかを判定し、
設定範囲内であればそれはほぼ音声のみの区間であるこ
とを意味しているので信号“２”を発生してスペクトル
サブトラクション処理部３０に送る。また、設定範囲外
であれば、ほとんど雑音のみの区間であることを意味し
ているので信号“０”を発生してスペクトルサブトラク
ション処理部３０に送る。Then, it is determined whether or not the directionality index D sent from the directionality detecting section 40 is less than or equal to the threshold value (1.0). If the result is less than or equal to the threshold value, then Based on the target sound direction information output from the target sound direction estimation unit 18, it is determined whether the target sound direction (speaker direction) is within the set range,
If it is within the set range, it means that it is a section of almost only voice, so a signal "2" is generated and sent to the spectrum subtraction processing unit 30. Further, if it is outside the set range, it means that it is a section including almost no noise, and therefore a signal “0” is generated and sent to the spectrum subtraction processing unit 30.

【０３２７】方向性指標Ｄがしきい値以上であり、目的
音方向が設定範囲内であれば、それは大きな突発性雑音
が音声区間に重畳している区間であることを意味してお
り、従って、話者方向から到来する音声に突発性の雑音
が重畳していると判定して信号“１”を発生してスペク
トルサブトラクション処理部３０に送り、方向性指標Ｄ
がしきい値以上であり、目的音方向が設定範囲外であれ
ば、それはほとんど雑音のみの区間であることを意味す
るので信号“０”を発生してスペクトルサブトラクショ
ン処理部３０に送る。If the directionality index D is equal to or more than the threshold value and the target sound direction is within the set range, it means that a large sudden noise is superposed on the voice section, and accordingly, , It is determined that sudden noise is superposed on the voice coming from the speaker direction, a signal "1" is generated and sent to the spectrum subtraction processing unit 30, and the directionality index D
Is greater than or equal to the threshold value and the target sound direction is outside the set range, it means that it is an interval of almost only noise, and therefore a signal “0” is generated and sent to the spectrum subtraction processing unit 30.

【０３２８】このようにして、スペクトルサブトラクシ
ョン制御部５０は、目的音方向推定部１８で推定された
目的音方向（話者方向）と、方向性検出部４０で計算さ
れた方向性指標Ｄに基づいて、３通りの信号（“０”，
“１”，“２”）のいずれかをスペクトルサブトラクシ
ョン処理部３０に送る。In this way, the spectral subtraction control unit 50 is based on the target sound direction (speaker direction) estimated by the target sound direction estimation unit 18 and the directionality index D calculated by the directionality detection unit 40. , Three kinds of signals (“0”,
Either “1” or “2”) is sent to the spectrum subtraction processing unit 30.

【０３２９】スペクトルサブトラクション処理部３０で
は、話者追尾マイクロホンアレイ１０の第１のビームフ
ォーマ１３から与えられる音声周波数成分と第２のビー
ムフォーマ１６（または、第３のビームフォーマ２２）
から与えられる雑音周波数成分とを受け、スペクトルサ
ブトラクション制御部５０から出力されるその３種類の
信号にしたがって所要の雑音圧縮処理を行う。In the spectral subtraction processing section 30, the voice frequency component given from the first beam former 13 of the speaker tracking microphone array 10 and the second beam former 16 (or the third beam former 22) are used.
And a noise frequency component given by the above, and performs necessary noise compression processing in accordance with the three types of signals output from the spectrum subtraction control unit 50.

【０３３０】スペクトルサブトラクション処理部３０で
は、このスペクトルサブトラクション制御部５０からの
信号を受け取り、信号が“０”の時は、ほとんど雑音区
間と見なせるので、最小重みをかけて、出力信号をカッ
トする。また、スペクトルサブトラクション制御部５０
からの信号が“１”の時は、音声区間に突発性雑音が重
畳していると見なし、第２のビームフォーマ１６からの
出力を雑音成分として扱う２チャネルのスペクトルサブ
トラクション処理を行う。The spectrum subtraction processing unit 30 receives the signal from the spectrum subtraction control unit 50, and when the signal is "0", it can be regarded almost as a noise section. Therefore, the output signal is cut by applying the minimum weight. In addition, the spectrum subtraction control unit 50
When the signal from 1 is “1”, it is considered that sudden noise is superposed on the voice section, and the two-channel spectral subtraction process is performed in which the output from the second beam former 16 is treated as a noise component.

【０３３１】すなわち、スペクトルサブトラクション制
御部５０から受け取った信号が“０”の時は、ほとんど
雑音区間と見なせるので、スペクトルサブトラクション
処理部３０では、制御部３５が帯域重み計算部３３に最
小重みを発生させるように制御指令を与え、これによ
り、帯域重み計算部３３は最小重みを発生させてスペク
トル減算部３４に与えるので、当該スペクトル減算部３
４は音声周波数成分の出力に当該最小重みをかける計算
をしてその結果を出力することで出力信号をカットす
る。That is, when the signal received from the spectrum subtraction control unit 50 is "0", it can be regarded as almost a noise section. Therefore, in the spectrum subtraction processing unit 30, the control unit 35 causes the band weight calculation unit 33 to generate the minimum weight. The band weight calculation unit 33 generates the minimum weight and gives it to the spectrum subtraction unit 34, so that the spectrum subtraction unit 3
Reference numeral 4 cuts the output signal by performing a calculation for applying the minimum weight to the output of the audio frequency component and outputting the result.

【０３３２】また、スペクトルサブトラクション制御部
５０からの信号が“１”の時は、音声区間に突発性雑音
が重畳していると見なすことができるので、この信号を
受けた制御部３５は帯域重み計算部３３に通常の重み係
数を計算して出力させるように制御指令を与え、これに
より、帯域重み計算部３３は音声帯域パワー計算部３１
と雑音帯域パワー計算部３２の出力を元に帯域重み係数
を求める。Further, when the signal from the spectrum subtraction control unit 50 is "1", it can be considered that sudden noise is superposed on the voice section. A control command is given to the calculation unit 33 so as to calculate and output a normal weighting coefficient, whereby the band weight calculation unit 33 causes the voice band power calculation unit 31 to operate.
Based on the output of the noise band power calculator 32, the band weight coefficient is obtained.

【０３３３】そして、帯域重み計算部３３はこの求めた
帯域重み係数をスペクトル減算部３４に与えるので、当
該スペクトル減算部３４は第１のビームフォーマ１３か
らの音声周波数成分の出力に、当該重みをかける計算を
してその結果を出力することで、第２のビームフォーマ
１６からの出力を雑音成分として扱う２チャネルのスペ
クトルサブトラクション処理が行われたことになり、当
該スペクトルサブトラクション処理結果を雑音抑圧処理
済みの音声周波数成分として出力できる。Then, the band weight calculation section 33 gives the obtained band weight coefficient to the spectrum subtraction section 34, so that the spectrum subtraction section 34 assigns the weight to the output of the audio frequency component from the first beam former 13. By performing the multiplication calculation and outputting the result, the two-channel spectral subtraction processing that treats the output from the second beamformer 16 as a noise component is performed, and the spectral subtraction processing result is subjected to the noise suppression processing. It can be output as a sound frequency component that has already been processed.

【０３３４】また、スペクトルサブトラクション制御部
５０からの信号が“２”の時は、音声のみの区間と見な
すことができるので、スペクトルサブトラクション処理
部３０では、この信号を受けた制御部３５は帯域重み計
算部３３に通常の重み係数を計算して出力させるように
制御指令を与え、これにより、帯域重み計算部３３は音
声帯域パワー計算部３１と雑音帯域パワー計算部３２の
出力を元に帯域重み係数を求める。Further, when the signal from the spectrum subtraction control unit 50 is "2", it can be regarded as a voice only section. Therefore, in the spectrum subtraction processing unit 30, the control unit 35 receiving this signal receives the band weight. A control command is given to the calculation unit 33 to calculate and output a normal weighting coefficient, whereby the band weight calculation unit 33 determines the band weight based on the outputs of the voice band power calculation unit 31 and the noise band power calculation unit 32. Find the coefficient.

【０３３５】そして、帯域重み計算部３３はこの求めた
帯域重み係数をスペクトル減算部３４に与えるので、当
該スペクトル減算部３４は第１のビームフォーマ１３か
らの音声周波数成分の出力に、当該重みをかける計算を
してその結果を出力することで、第１のビームフォーマ
１３の出力に対し、１チャネルのスペクトルサブトラク
ションを行った状態の結果を得、これを音声周波数成分
の出力として出力する。Then, the band weight calculation unit 33 gives the obtained band weight coefficient to the spectrum subtraction unit 34, so that the spectrum subtraction unit 34 applies the weight to the output of the audio frequency component from the first beam former 13. By performing the multiplication calculation and outputting the result, the result of the state in which the 1-channel spectral subtraction is performed on the output of the first beamformer 13 is obtained, and this is output as the output of the audio frequency component.

【０３３６】なお、上述同様に別の制御方法として、信
号“１”の時は、大きな突発性雑音が音声に重畳してい
るため、雑音区間と見なして信号“０”と同様の処理に
してもよい。As another control method similar to the above, when the signal is "1", a large sudden noise is superposed on the voice, so that it is regarded as a noise section and the same processing as the signal "0" is performed. Good.

【０３３７】本装置により、方向を持つ雑音成分および
方向のない雑音成分を抑圧したひずみの少ない音声成分
のみの抽出ができると共に、突発的な雑音に対しても、
雑音成分を抑圧したひずみの少ない音声成分の抽出がで
き、しかも、スペクトルサブトラクション（ＳＳ）処理
において、雑音成分のパワーを修正するようにしたこと
により、より高精度に雑音抑圧を行うことが可能となる
雑音抑圧処理装置を提供できる。The present apparatus can extract only a voice component with little distortion in which a noise component having a direction and a noise component having no direction are suppressed, and also with respect to a sudden noise,
It is possible to extract a voice component with less distortion that suppresses the noise component, and further, by correcting the power of the noise component in the spectral subtraction (SS) processing, it is possible to perform noise suppression with higher accuracy. It is possible to provide the following noise suppression processing device.

【０３３８】これにより、方向を持つ雑音成分および方
向のない雑音成分を抑圧した歪みの少い音声成分のみの
抽出ができるようになる。As a result, it becomes possible to extract only the voice component with a small distortion in which the noise component having the direction and the noise component having no direction are suppressed.

【０３３９】このように、この例は雑音抑圧装置におい
て、音声入力手段から得られた入力信号を周波数分析し
た入力信号の周波数成分を周波数帯域毎に分割し、帯域
毎の入力パワーを計算する入力帯域パワー計算手段を設
けて、スペクトル減算手段には、入力帯域パワーと音声
帯域パワーと雑音帯域パワーとに基き、音声信号の周波
数帯域毎に重みをかけて背景雑音を抑圧する処理を実施
させるように構成したことを特徴とする。As described above, in this example, in the noise suppressor, the input signal obtained by frequency-analyzing the input signal obtained from the voice input means is divided into frequency components for each frequency band, and the input power for each band is calculated. A band power calculation means is provided to cause the spectrum subtraction means to carry out a process of weighting each frequency band of the voice signal and suppressing the background noise based on the input band power, the voice band power and the noise band power. It is characterized in that it is configured in.

【０３４０】この構成の場合、音声帯域パワー計算手段
は、得られた音声周波数のスペクトル成分を、周波数帯
域毎に分割して帯域毎の音声パワーを計算し、雑音帯域
パワー計算手段は、前記得られた雑音周波数のスペクト
ル成分を、周波数帯域毎に分割して帯域毎の雑音パワー
を計算する。また、入力帯域パワー計算手段があり、こ
の入力帯域パワー計算手段は、音声入力手段から得られ
た入力信号を周波数分析して得た入力音声の周波数スペ
クトル成分を受けて、これを周波数帯域毎に分割し、帯
域毎の入カパワーを計算する。そして、スペクトル減算
手段は、前記音声帯域パワー計算手段と雑音帯域パワー
計算手段とから得られる音声と雑音の周波数帯域パワー
に基き、音声信号の周波数帯域毎に重みをかけて背景雑
音を抑圧する。In the case of this configuration, the voice band power calculation means divides the obtained spectrum component of the voice frequency into frequency bands to calculate the voice power for each band, and the noise band power calculation means obtains the obtained voice power. The spectral component of the obtained noise frequency is divided for each frequency band, and the noise power for each band is calculated. Further, there is an input band power calculation means, and this input band power calculation means receives the frequency spectrum component of the input voice obtained by frequency-analyzing the input signal obtained from the voice input means, and receives this for each frequency band. Divide and calculate the input power for each band. Then, the spectrum subtracting means suppresses the background noise by weighting each frequency band of the voice signal based on the frequency band power of the voice and noise obtained from the voice band power calculating means and the noise band power calculating means.

【０３４１】この実施例においては、構成例１における
スペクトルサブトラクション処理において、更に雑音成
分についてそのパワーを修正するようにしたことによ
り、一層高精度に雑音抑圧を行うことを可能とするもの
である。すなわち、第３の発明では雑音源のパワ−Ｎが
小さいという仮定をおいたため、スペクトルサブトラク
ション処理を行うと雑音源の成分が音声に重畳している
部分では歪みが大きくなることが避けられない点を、こ
こでは入力信号のパワーを用いてスペクトルサブトラク
ション処理における帯域重み係数の計算の修正するよう
にした。In this embodiment, in the spectral subtraction processing in the configuration example 1, the power of the noise component is further corrected, so that the noise can be suppressed with higher accuracy. That is, in the third invention, it is assumed that the power N of the noise source is small. Therefore, when the spectral subtraction process is performed, it is unavoidable that the distortion becomes large in the portion where the noise source component is superimposed on the voice. Here, the power of the input signal is used to correct the calculation of the band weighting coefficient in the spectral subtraction processing.

【０３４２】これにより、方向を持つ雑音成分および方
向のない雑音成分を抑圧した歪みの少い音声成分のみの
抽出ができるようになるものである。As a result, it becomes possible to extract only the voice component with a small distortion by suppressing the noise component having the direction and the noise component having no direction.

【０３４３】以上、種々の実施例を説明したが、本発明
は第１には、話者の音声を異なる２箇所以上の位置で受
音してそれぞれ音声信号として出力する音声入力手段
と、前記受音位置に対応する音声信号のチャネル毎に周
波数分析を行ってそれぞれチャネル別の周波数成分を出
力する周波数分析手段と、前記周波数分析手段の出力す
る複数チャネルの周波数成分を用いて適応フィルタ処理
により目的の音声以外の到来雑音の抑圧処理を行い、目
的音声成分の信号を出力する第１のビームフォーマ処理
手段と、前記周波数分析手段の出力する複数チャネルの
周波数成分を用いて適応フィルタ処理により目的の音声
の抑圧処理を行って雑音成分の信号を出力する第２のビ
ームフォーマ処理手段と、前記第１のビームフォーマ処
理手段で計算されるフィルタ係数から雑音方向を推定す
る雑音方向推定手段と、前記第２のビームフォーマ処理
手段で計算されるフィルタ係数から目的音方向を推定す
る目的音方向推定手段と、前記第１のビームフォーマ処
理手段において入力対象とする目的音の到来方向である
第１の入力方向を、前記目的音方向推定手段で推定され
た目的音方向に基づいて逐次修正する第１の入力方向修
正手段と、前記第２のビームフォーマ処理手段において
入力対象とする雑音の到来方向である第２の入力方向
を、前記雑音方向推定手段で推定された雑音方向に基づ
いて逐次修正する第２の入力方向修正手段と、前記第１
のビームフォーマ処理手段の出力と第２のビームフォー
マ処理手段の出力に基づいて非線形の雑音抑圧処理であ
るスペクトルサブトラクション処理を行うスペクトルサ
ブトラクション手段と、前記周波数分析手段から出力さ
れた周波数成分から到来音の時間差と振幅の差に基づい
た方向性の指標を計算する方向性検出手段と、該方向性
指標と前記目的音方向推定手段から出力された目的音方
向とに基づいて前記スペクトルサブトラクション手段の
スペクトルサブトラクション処理を制御するスペクトル
サブトラクション制御手段とを具備して構成したもので
ある。While various embodiments have been described above, the first aspect of the present invention is voice input means for receiving a voice of a speaker at two or more different positions and outputting the voice signals as voice signals. Frequency analysis means for performing frequency analysis for each channel of the audio signal corresponding to the sound receiving position and outputting frequency components for each channel, and adaptive filter processing using frequency components of a plurality of channels output by the frequency analysis means First beam former processing means for suppressing incoming noise other than the target voice and outputting a signal of the target voice component, and adaptive filter processing using the frequency components of a plurality of channels output by the frequency analysis means Is calculated by the second beamformer processing means for performing the suppression processing of the voice and outputting the signal of the noise component, and the first beamformer processing means. A noise direction estimating means for estimating a noise direction from a filter coefficient, a target sound direction estimating means for estimating a target sound direction from a filter coefficient calculated by the second beamformer processing means, and the first beamformer processing means. A first input direction correcting means for sequentially correcting a first input direction, which is a direction of arrival of a target sound to be input, on the basis of the target sound direction estimated by the target sound direction estimating means; Second input direction correction means for sequentially correcting the second input direction, which is the arrival direction of the noise to be input in the beamformer processing means, based on the noise direction estimated by the noise direction estimation means, First
Spectrum subtraction means for performing a non-linear noise suppression processing based on the output of the beam former processing means and the output of the second beam former processing means, and the incoming sound from the frequency component output from the frequency analyzing means. Of the spectral subtraction means based on the directionality index and the target sound direction output from the target sound direction estimating means And a spectral subtraction control means for controlling the subtraction process.

【０３４４】そして、このような構成の場合、話者の発
声した音声を異なる２箇所以上の位置で音声入力手段は
受音し、周波数分析手段では、これを前記受音位置に対
応する音声信号のチャネル毎に周波数分析して複数チャ
ネルの周波数成分を出力する。そして、第１のビームフ
ォーマ処理手段はこの周波数分析手段にて得られる前記
複数チャネルの周波数成分について、所望方向外の感度
が低くなるように計算したフィルタ係数を用いての適応
フィルタ処理を施すことにより前記話者方向からの音声
以外の音声を抑圧する到来雑音抑圧処理を行い、目的音
声成分を得、また、第２のビームフォーマ処理手段は、
前記周波数分析手段にて得られる前記複数チャネルの周
波数成分について、所望方向外の感度が低くなるように
計算したフィルタ係数を用いての適応フィルタ処理を施
すことにより前記話者方向からの音声を抑圧し、雑音成
分を得る。そして、雑音方向推定手段は、前記第１のビ
ームフォーマ処理手段で計算されるフィルタ係数から雑
音方向を推定し、目的音方向推定手段は、前記第２のビ
ームフォーマ処理手段で計算されるフィルタ係数から目
的音方向を推定する。目的音方向修正手段は、前記第１
のビームフォーマにおいて入力対象となる目的音の到来
方向である第１の入力方向を、前記目的音方向推定手段
で推定された目的音方向に基づいて逐次修正するので、
第１のビームフォーマは第１の入力方向以外から到来す
る雑音成分を抑圧して話者の音声成分を低雑音で抽出す
ることになる。また、雑音方向修正手段は、前記第２の
ビームフォーマにおいて入力対象とする雑音の到来方向
である第２の入力方向を、前記雑音方向推定手段で推定
された雑音方向に基づいて逐次修正するので、第２のビ
ームフォーマは第２の入力方向以外から到来する成分を
抑圧して話者の音声成分を抑圧した残りの雑音成分を抽
出することになる。In the case of such a configuration, the voice input means receives the voice uttered by the speaker at two or more different positions, and the frequency analysis means receives the voice signal corresponding to the sound receiving position. The frequency analysis is performed for each channel and the frequency components of a plurality of channels are output. Then, the first beamformer processing means performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. The incoming beam noise suppressing process for suppressing a voice other than the voice from the speaker direction is performed to obtain a target voice component, and the second beamformer processing means
The frequency components of the plurality of channels obtained by the frequency analysis means are subjected to adaptive filter processing using filter coefficients calculated so that sensitivity outside the desired direction becomes low, thereby suppressing voice from the speaker direction. Then, the noise component is obtained. The noise direction estimating means estimates the noise direction from the filter coefficient calculated by the first beamformer processing means, and the target sound direction estimating means calculates the noise direction by the filter coefficient calculated by the second beamformer processing means. The target sound direction is estimated from. The target sound direction correcting means is the first
In the beam former, the first input direction, which is the arrival direction of the target sound to be input, is sequentially corrected on the basis of the target sound direction estimated by the target sound direction estimation means.
The first beam former suppresses noise components coming from other than the first input direction and extracts the speaker's voice component with low noise. Further, the noise direction correction means sequentially corrects the second input direction, which is the arrival direction of noise to be input in the second beamformer, based on the noise direction estimated by the noise direction estimation means. , The second beam former suppresses the components coming from other than the second input direction and extracts the remaining noise components in which the speaker's voice component is suppressed.

【０３４５】このように本システムは雑音成分を抑圧し
た音声周波数成分と、音声成分を抑圧した雑音周波数成
分とを別々に得ることができるが、この発明の最大の特
徴は、第１及び第２のビームフォーマとして、周波数領
域で動作するビームフォーマを用いるようにした点、そ
して、本発明では、突発性の雑音にも対処できるよう
に、短時間データを用いて到来音が目的方向から到来し
たかどうかを決めるための方向性の指標を高精度に求め
る方向性検出手段を組み入れ、方向性指標と従来処理に
おける話者方向とからスペクトルサブトラクションを制
御して突発性雑音を抑圧するようにした点にある。As described above, the present system can separately obtain the voice frequency component in which the noise component is suppressed and the noise frequency component in which the voice component is suppressed. The greatest feature of the present invention is that the first and second The beamformer operating in the frequency domain is used as the beamformer of the above, and in the present invention, the incoming sound comes from the target direction by using short-time data so as to deal with sudden noise. Incorporating directionality detection means to obtain a directionality index for deciding whether or not to be accurate, and suppressing spectral noise by controlling the spectral subtraction based on the directionality index and the speaker direction in conventional processing. It is in.

【０３４６】これによると、上述した方向性指標とビー
ムフォーマのフィルタの指向性から求めた話者方向との
両方に基づいて目的音／雑音の判定を行うことにより、
設定した話者範囲以外からの音声を除去できるととも
に、突発雑音など、継続時間の短い信号も高精度で除去
できるようになるため、実環境における雑音抑圧処理を
極めて高精度に行うことが可能となる。According to this, by determining the target sound / noise based on both the above-mentioned directionality index and the speaker direction obtained from the directivity of the beamformer filter,
It is possible to remove voices from areas other than the set speaker range, and it is also possible to remove signals with short durations, such as sudden noise, with high accuracy, making it possible to perform noise suppression processing in an actual environment with extremely high accuracy. Become.

【０３４７】また、第１及び第２のビームフォーマとし
て、周波数領域で動作するビームフォーマを用いるよう
にしたことによって、計算量を大幅に削減することがで
きるようになる。Further, since the beam formers operating in the frequency domain are used as the first and second beam formers, the amount of calculation can be greatly reduced.

【０３４８】そしてこの発明によると、適応フィルタの
処理量が大幅に低減されるのに加え、入力音声に対する
周波数分析以外の周波数分析処理を省略することがで
き、かつ、フィルタ演算時に必要であった時間領域から
周波数領域ヘの変換処理も不要となり、全体の演算量を
大幅に削減することができる。According to the present invention, the processing amount of the adaptive filter is greatly reduced, and frequency analysis processing other than the frequency analysis on the input voice can be omitted, and it is necessary for the filter calculation. The conversion processing from the time domain to the frequency domain is also unnecessary, and the overall calculation amount can be significantly reduced.

【０３４９】すなわち、従来技術では、ビームフォーマ
で抑圧できない拡散性雑音の抑圧処理のために、スペク
トルサブトラクション（以後、ＳＳと略称する）処理
を、ビームフォーマ処理の後に行うようにしており、こ
のＳＳは周波数スペクトルを入力とするため、ＦＦＴ
（高速フーリエ変換）などの周波数分析が従来必要であ
ったが、周波数領域で動作するビームフォーマを用いる
と当該ビームフォーマからは周波数スペクトルが出力さ
れるため、これをＳＳに流用できるので、特別にＳＳの
ためのＦＦＴを実施する従来のＦＦＴ処理工程は省略す
ることができる。故に、全体の演算量を大幅に削減する
ことができる。That is, in the prior art, the spectral subtraction (hereinafter abbreviated as SS) process is performed after the beamformer process in order to suppress the spreading noise that cannot be suppressed by the beamformer. Takes the frequency spectrum as input, so FFT
Frequency analysis such as (Fast Fourier Transform) has been conventionally required, but when a beamformer operating in the frequency domain is used, the frequency spectrum is output from the beamformer, and this can be diverted to SS. Conventional FFT processing steps that perform FFT for SS can be omitted. Therefore, the total calculation amount can be significantly reduced.

【０３５０】また、ビームフォーマのフィルタを用いた
方向推定の際に必要であった時間領域から周波数領域へ
の変換処理も不要となり、全体の演算量を大幅に削減す
ることができる。Further, the conversion processing from the time domain to the frequency domain, which is necessary for the direction estimation using the filter of the beamformer, becomes unnecessary, and the total amount of calculation can be greatly reduced.

【０３５１】また、本発明は第２には、話者の音声を異
なる２箇所以上の位置で受音してそれぞれ音声信号とし
て出力する音声入力手段と、前記受音位置に対応する音
声信号のチャネル毎に周波数分析を行ってそれぞれチャ
ネル別の周波数成分を出力する周波数分析手段と、この
周波数分析手段にて得られる前記複数チャネルの周波数
成分について、所望方向外の感度が低くなるように計算
したフィルタ係数を用いての適応フィルタ処理を施すこ
とにより前記話者方向からの音声以外の音声を抑圧する
到来雑音抑圧処理を行い、目的音声成分を得る第１のビ
ームフォーマ処理手段と、前記周波数分析手段にて得ら
れる前記複数チャネルの周波数成分について、所望方向
外の感度が低くなるように計算したフィルタ係数を用い
ての適応フィルタ処理を施すことにより前記話者方向か
らの音声を抑圧し、第１の雑音成分を得る第２のビーム
フォーマ処理手段と、前記周波数分析手段にて得られる
前記複数チャネルの周波数成分について、所望方向外の
感度が低くなるように計算したフィルタ係数を用いての
適応フィルタ処理を施すことにより前記話者方向からの
音声を抑圧し、第２の雑音成分を得る第２のビームフォ
ーマ処理手段と、前記第１のビームフォーマ処理手段で
計算されるフィルタ係数から雑音方向を推定する雑音方
向推定手段と、前記第２のビームフォーマ処理手段で計
算されるフィルタ係数から第１の目的音方向を推定する
第１の目的音方向推定手段と、前記第３の適応ビームフ
ォーマ処理手段で計算されるフィルタ係数から第２の目
的音方向を推定する第２の目的音方向推定手段と、前記
第１のビームフォーマ処理手段において入力対象とする
目的音の到来方向である第１の入力方向を、前記第１の
目的音方向推定手段で推定された第１の目的音方向と、
第２の目的音方向推定手段で推定された第２の目的音方
向のいずれか一方または両方に基づいて逐次修正する第
１の入力方向修正手段と、前記雑音方向修正手段で推定
された雑音方向が所定の第１の範囲にある場合に、前記
第２のビームフォーマ処理手段において入力対象とする
雑音の到来方向である第２の入力方向を該雑音方向に基
づいて逐次修正する第２の入力方向修正手段と、前記雑
音方向修正手段で推定された雑音方向が所定の第２の範
囲にある場合に、前記第３のビームフォーマ処理手段に
おいて入力対象とする雑音の到来方向である第３の入力
方向を該雑音方向に基づいて逐次修正する第３の入力方
向修正手段と、前記雑音方向推定手段で推定された雑音
方向が所定の第１の範囲から到来したか所定の第２の範
囲から到来したかに基づいて前記第１および第２の出力
雑音のいずれか一方を真の雑音出力と決定していずれか
一方の雑音を出力すると同時に、第１の音声方向推定手
段と第２の音声方向推定手段のいずれの推定結果が有効
であるかを決定していずれか一方の音声方向推定結果を
第１の入力方向修正手段へ出力する有効雑音決定手段
と、前記第１のビームフォーマ処理手段の出力と第２の
ビームフォーマ処理手段の出力に基づいて非線形の雑音
抑圧処理であるスペクトルサブトラクション処理を行う
スペクトルサブトラクション手段と、前記周波数分析手
段から出力された周波数成分から到来音の時間差と振幅
の差に基づいた方向性の指標を計算する方向性検出手段
と、該方向性指標と前記目的音方向推定手段から出力さ
れた目的音方向とに基づいて前記スペクトルサブトラク
ション手段のスペクトルサブトラクション処理を制御す
るスペクトルサブトラクション制御手段とを具備して構
成した。Secondly, according to the present invention, a voice input means for receiving a voice of a speaker at two or more different positions and outputting the voice signals respectively, and a voice signal corresponding to the sound receiving position. Frequency analysis means for performing frequency analysis for each channel and outputting frequency components for each channel, and frequency components of the plurality of channels obtained by this frequency analysis means were calculated so that sensitivity outside the desired direction becomes low. First beamformer processing means for obtaining a target voice component by performing incoming noise suppression processing for suppressing voices other than the voice from the speaker direction by performing adaptive filter processing using a filter coefficient, and the frequency analysis. Adaptive filter using the filter coefficient calculated so that the sensitivity outside the desired direction becomes low for the frequency components of the plurality of channels obtained by the means A second beamformer processing unit for suppressing the voice from the speaker direction to obtain a first noise component by applying a logic, and the frequency components of the plurality of channels obtained by the frequency analysis unit in a desired direction. Second beamformer processing means for suppressing the voice from the speaker direction and obtaining a second noise component by performing adaptive filter processing using a filter coefficient calculated so that the outside sensitivity becomes low; A noise direction estimating means for estimating a noise direction from the filter coefficient calculated by the first beamformer processing means, and a first target sound direction is estimated from a filter coefficient calculated by the second beamformer processing means. A second target sound direction for estimating the second target sound direction from the first target sound direction estimation means and the filter coefficient calculated by the third adaptive beamformer processing means. The first input sound direction estimated by the first target sound direction estimation means is the first input direction which is the arrival direction of the target sound input by the estimation means and the first beamformer processing means. When,
First input direction correcting means for sequentially correcting based on one or both of the second target sound directions estimated by the second target sound direction estimating means, and the noise direction estimated by the noise direction correcting means Is within a predetermined first range, the second input for sequentially correcting the second input direction, which is the arrival direction of the noise to be input in the second beamformer processing means, based on the noise direction. When the noise direction estimated by the direction correcting means and the noise direction correcting means is within a predetermined second range, the third direction which is the arrival direction of the noise to be input in the third beamformer processing means Third input direction correcting means for sequentially correcting the input direction based on the noise direction, and whether the noise direction estimated by the noise direction estimating means has come from a predetermined first range or from a predetermined second range. Has it arrived Based on this, either one of the first and second output noises is determined as a true noise output, and either one of the noises is output, and at the same time, the first voice direction estimating means and the second voice direction estimating means Effective noise determining means for determining which estimation result is valid and outputting one of the speech direction estimation results to the first input direction correcting means, an output of the first beamformer processing means, and Based on the output of the second beamformer processing means, the spectrum subtraction means performs the spectrum subtraction processing which is a non-linear noise suppression processing, and the time difference and the amplitude difference of the incoming sound from the frequency component output from the frequency analysis means. Directionality detecting means for calculating a directionality index, and the spatial direction based on the directionality index and the target sound direction output from the target sound direction estimating means. It was constructed and a spectral subtraction control means for controlling the spectral subtraction process torr subtraction means.

【０３５２】この構成の場合、話者の発声した音声を異
なる２箇所以上の位置で音声入力手段は受音し、周波数
分析手段では、これを前記受音位置に対応する音声信号
のチャネル毎に周波数分析して複数チャネルの周波数成
分を出力する。そして、第１のビームフォーマ処理手段
はこの周波数分析手段にて得られる前記複数チャネルの
周波数成分について、所望方向外の感度が低くなるよう
に計算したフィルタ係数を用いての適応フィルタ処理を
施すことにより前記話者方向からの音声以外の音声を抑
圧する到来雑音抑圧処理を行い、目的音声成分を得、ま
た、第２のビームフォーマ処理手段は、前記周波数分析
手段にて得られる前記複数チャネルの周波数成分につい
て、所望方向外の感度が低くなるように計算したフィル
タ係数を用いての適応フィルタ処理を施すことにより前
記話者方向からの音声を抑圧し、雑音成分を得る。そし
て、雑音方向推定手段は、前記第１のビームフォーマ処
理手段で計算されるフィルタ係数から雑音方向を推定
し、目的音方向推定手段は、前記第２のビームフォーマ
処理手段で計算されるフィルタ係数から目的音方向を推
定する。In the case of this configuration, the voice input means receives the voice uttered by the speaker at two or more different positions, and the frequency analysis means receives this for each channel of the voice signal corresponding to the sound receiving position. Frequency analysis is performed and frequency components of a plurality of channels are output. Then, the first beamformer processing means performs adaptive filter processing on the frequency components of the plurality of channels obtained by the frequency analysis means using filter coefficients calculated so that the sensitivity outside the desired direction becomes low. Incoming noise suppression processing for suppressing voices other than the voice from the speaker direction is performed to obtain a target voice component, and the second beamformer processing means is for the plurality of channels of the plurality of channels obtained by the frequency analysis means. The frequency component is subjected to adaptive filter processing using a filter coefficient calculated so that sensitivity outside the desired direction becomes low, whereby the voice from the speaker direction is suppressed and a noise component is obtained. The noise direction estimating means estimates the noise direction from the filter coefficient calculated by the first beamformer processing means, and the target sound direction estimating means calculates the noise direction by the filter coefficient calculated by the second beamformer processing means. The target sound direction is estimated from.

【０３５３】また、第１の目的音方向推定手段は前記第
２のビームフォーマ処理手段で計算されるフィルタ係数
から第１の目的音方向を推定し、第２の目的音方向推定
手段は、前記第３の適応ビームフォーマ処理手段で計算
されるフィルタ係数から第２の目的音方向を推定する。The first target sound direction estimating means estimates the first target sound direction from the filter coefficient calculated by the second beamformer processing means, and the second target sound direction estimating means calculates The second target sound direction is estimated from the filter coefficient calculated by the third adaptive beam former processing means.

【０３５４】第１の入力方向修正手段は、前記第１のビ
ームフォーマにおいて入力対象とする目的音の到来方向
である第１の入力方向を、前記第１の目的音方向推定手
段で推定された第１の目的音方向と、第２の目的音方向
推定手段で推定された第２の目的音方向のいずれか一方
または両方に基づいて逐次修正する。そして、第２の入
力方向修正手段は、前記雑音方向修正手段で推定された
雑音方向が所定の第１の範囲にある場合に、前記第２の
ビームフォーマにおいて入力対象とする雑音の到来方向
である第２の入力方向を該雑音方向に基づいて逐次修正
し、第３の入力方向修正手段は、前記雑音方向修正手段
で推定された雑音方向が所定の第２の範囲にある場合
に、前記第３のビームフォーマにおいて入力対象とする
雑音の到来方向である第３の入力方向を該雑音方向に基
づいて逐次修正する。The first input direction correcting means estimates the first input direction, which is the arrival direction of the target sound to be input in the first beam former, by the first target sound direction estimating means. The first target sound direction and the second target sound direction estimated by the second target sound direction estimating means are sequentially corrected based on one or both of them. When the noise direction estimated by the noise direction correcting unit is within a predetermined first range, the second input direction correcting unit determines the arrival direction of noise to be input in the second beamformer. A certain second input direction is sequentially corrected on the basis of the noise direction, and the third input direction correction means, when the noise direction estimated by the noise direction correction means is within a predetermined second range, The third input direction, which is the arrival direction of noise to be input in the third beam former, is sequentially corrected based on the noise direction.

【０３５５】従って、第２の入力方向修正手段の出力に
より第２の入力方向を修正される第２のビームフォーマ
は第２の入力方向以外から到来する成分を抑圧して残り
の雑音成分を抽出することになり、また、第３の入力方
向修正手段の出力により第３の入力方向を修正される第
３のビームフォーマは第３の入力方向以外から到来する
成分を抑圧して残りの雑音成分を抽出することになる。Therefore, the second beam former whose second input direction is corrected by the output of the second input direction correcting means suppresses components coming from other than the second input direction and extracts the remaining noise components. In addition, the third beam former whose third input direction is corrected by the output of the third input direction correcting means suppresses the components arriving from other than the third input direction and remaining noise components. Will be extracted.

【０３５６】そして、有効雑音決定手段は、前記雑音方
向推定手段で推定された雑音方向が所定の第１の範囲か
ら到来したか所定の第２の範囲から到来したかに基づい
て前記第１の出力雑音と前記第２の出力雑音のいずれか
一方を真の雑音出力と決定していずれか一方の雑音を出
力すると同時に、第１の音声方向推定手段と第２の音声
方向推定手段のいずれの推定結果が有効であるかを決定
して有効な方の音声方向推定結果を第１の入力方向修正
手段へ出力する。Then, the effective noise determining means determines whether the noise direction estimated by the noise direction estimating means comes from the predetermined first range or the predetermined second range. Either one of the output noise and the second output noise is determined to be a true noise output, and either one of the noises is output, and at the same time, one of the first voice direction estimating means and the second voice direction estimating means is output. It is determined whether the estimation result is valid and the valid voice direction estimation result is output to the first input direction correcting means.

【０３５７】この結果、目的音方向修正手段は、前記第
１のビームフォーマにおいて入力対象となる目的音の到
来方向である第１の入力方向を、前記決定した方の目的
音方向推定手段で得た目的音方向に基づいて逐次修正す
るので、第１のビームフォーマは第１の入力方向以外か
ら到来する雑音成分を抑圧して話者の音声成分を低雑音
で抽出することになる。As a result, the target sound direction correction means obtains the first input direction, which is the arrival direction of the target sound to be input in the first beam former, by the determined target sound direction estimation means. The first beamformer suppresses the noise components coming from other than the first input direction and extracts the speaker's voice component with low noise, because the correction is sequentially performed based on the target sound direction.

【０３５８】このように本システムは雑音成分を抑圧し
た音声周波数成分と、音声成分を抑圧した雑音周波数成
分とを別々に得ることができるが、この発明の最大の特
徴は、第１及び第２のビームフォーマとして、周波数領
域で動作するビームフォーマを用いるようにした点、そ
して、本発明では、突発性の雑音にも対処できるよう
に、短時間データを用いて到来音が目的方向から到来し
たかどうかを決めるための方向性の指標を高精度に求め
る方向性検出手段を組み入れ、方向性指標と従来処理に
おける話者方向とからスペクトルサブトラクションを制
御して突発性雑音を抑圧するようにした点にある。As described above, this system can separately obtain the voice frequency component in which the noise component is suppressed and the noise frequency component in which the voice component is suppressed. The greatest feature of the present invention is that the first and second The beamformer operating in the frequency domain is used as the beamformer of the above, and in the present invention, the incoming sound comes from the target direction by using short-time data so as to deal with sudden noise. Incorporating directionality detection means to obtain a directionality index for deciding whether or not to be accurate, and suppressing spectral noise by controlling the spectral subtraction based on the directionality index and the speaker direction in conventional processing. It is in.

【０３５９】これによると、上述した方向性指標とビー
ムフォーマのフィルタの指向性から求めた話者方向との
両方に基づいて目的音／雑音の判定を行うことにより、
設定した話者範囲以外からの音声を除去できるととも
に、突発雑音など、継続時間の短い信号も高精度で除去
できるようになるため、実環境における雑音抑圧処理を
極めて高精度に行うことが可能となる。According to this, by determining the target sound / noise based on both the above-mentioned directionality index and the speaker direction obtained from the directivity of the filter of the beamformer,
It is possible to remove voices from areas other than the set speaker range, and it is also possible to remove signals with short durations, such as sudden noise, with high accuracy, making it possible to perform noise suppression processing in an actual environment with extremely high accuracy. Become.

【０３６０】また、第１及び第２のビームフォーマとし
て、周波数領域で動作するビームフォーマを用いるよう
にしたことによって、計算量を大幅に削減することがで
きるようになる。Further, since the beam formers operating in the frequency domain are used as the first and second beam formers, the amount of calculation can be greatly reduced.

【０３６１】そしてこの発明によると、適応フィルタの
処理量が大幅に低減されるのに加え、入力音声に対する
周波数分析以外の周波数分析処理を省略することがで
き、かつ、フィルタ演算時に必要であった時間領域から
周波数領域ヘの変換処理も不要となり、全体の演算量を
大幅に削減することができる。According to the present invention, the processing amount of the adaptive filter is significantly reduced, and frequency analysis processing other than the frequency analysis on the input voice can be omitted, which is necessary for the filter calculation. The conversion processing from the time domain to the frequency domain is also unnecessary, and the overall calculation amount can be significantly reduced.

【０３６２】また、本発明では、雑音追尾に監視領域を
全く異ならせた雑音追尾用のビームフォーマを設けてあ
り、それぞれの出力からそれぞれ音声方向を推定させる
と共に、それぞれの推定結果からいずれが有効な雑音追
尾をしているかを判断して、有効と判断された方のビー
ムフォーマのフィルタ係数による音声方向の推定結果を
第１の目的音方向修正手段に与えることで第１の目的音
方向修正手段は、前記第１のビームフォーマにおいて入
力対象となる目的音の到来方向である第１の入力方向
を、前記目的音方向推定手段で推定された目的音方向に
基づいて逐次修正するので、第１のビームフォーマは第
１の入力方向以外から到来する雑音成分を抑圧して話者
の音声成分を低雑音で抽出することができ、雑音源が移
動してもこれを見失うことなく追尾して抑圧することが
できるようになるものである。Further, in the present invention, a noise tracking beamformer having completely different monitoring areas for noise tracking is provided to estimate the voice direction from each output and which is effective from each estimation result. It is determined whether or not the noise tracking is performed properly, and the result of estimation of the voice direction by the filter coefficient of the beamformer that is determined to be effective is given to the first target sound direction correction means to correct the first target sound direction. The means sequentially corrects the arrival direction of the target sound as the input target in the first beam former, based on the target sound direction estimated by the target sound direction estimation means. The beamformer No. 1 can suppress the noise component coming from other than the first input direction and extract the speaker's voice component with low noise, and even if the noise source moves, this is lost. In which it is possible to suppress and tracking without.

【０３６３】従来技術においては、２ｃｈ、すなわち、
２本のマイクロホンだけでも目的音源の追尾を可能とす
べく、雑音追尾用のビームフォーマを雑音抑圧のビーム
フォーマとは別に１個用いるが、例えば、雑音源が目的
音の方向を横切って移動したような場合、雑音の追尾精
度が低下することがあった。In the prior art, 2ch, that is,
In order to enable tracking of the target sound source with only two microphones, one beamformer for noise tracking is used separately from the beamformer for noise suppression. For example, the noise source has moved across the direction of the target sound. In such a case, noise tracking accuracy may decrease.

【０３６４】しかし、本発明では、雑音を追尾するビー
ムフォーマを複数用いて各々別個の追尾範囲を受け持つ
ようにしたことにより、上記のような場合でも追尾精度
の低下を抑止できるようになる。However, according to the present invention, since a plurality of beam formers for tracking noise are used to take charge of separate tracking ranges, it is possible to prevent the tracking accuracy from deteriorating even in the above case.

【０３６５】尚、本発明は上述した実施例に限定される
ものではなく、種々変形して実施可能である。The present invention is not limited to the above-mentioned embodiments, but can be modified in various ways.

【０３６６】[0366]

【発明の効果】以上、詳述したように、本発明によれ
ば、全体の演算量を大幅に削減することができ、また、
ビームフォーマのフィルタを用いた方向推定の際に必要
であった時間領域から周波数領域への変換処理も不要と
なり、全体の演算量を大幅に削減することができると云
う効果が得られる。As described above in detail, according to the present invention, it is possible to significantly reduce the total calculation amount, and
The conversion processing from the time domain to the frequency domain, which was necessary when performing the direction estimation using the filter of the beamformer, is not required, and the effect that the entire calculation amount can be significantly reduced can be obtained.

【０３６７】また、本発明では、雑音成分を取り出すビ
ームフォーマを用意して、このビームフォーマの出力を
用いるようにしたため、位相のずれは補正されており、
従って、非定常雑音の場合でも高精度なスペクトルサブ
トラクション処理を実現できる。さらに、周波数領域の
ビームフォーマの出力を利用しているため、周波数分析
を省略してスペクトルサブトラクションが可能であり、
従来より少ない演算量で非定常雑音を抑圧できて、方向
性のある雑音成分ばかりか、方向性のない雑音成分（背
景雑音）も抑圧できて歪みの少い音声成分の抽出ができ
るようになると云う効果が得られる。Further, in the present invention, since the beam former for extracting the noise component is prepared and the output of this beam former is used, the phase shift is corrected,
Therefore, highly accurate spectrum subtraction processing can be realized even in the case of non-stationary noise. Furthermore, since the output of the beamformer in the frequency domain is used, it is possible to omit frequency analysis and perform spectral subtraction.
It is possible to suppress non-stationary noise with a smaller amount of calculation than before, and to suppress not only directional noise components but also non-directional noise components (background noise) and extract speech components with less distortion. A so-called effect can be obtained.

【０３６８】特に本発明は方向性指標とビームフォーマ
のフィルタの指向性から求めた話者方向との両方に基づ
いて目的音／雑音の判定を行うようにしたことにより、
設定した話者範囲以外からの音声を除去できるととも
に、突発雑音など、継続時間の短い信号も高精度で除去
できるようになるため、実環境における雑音抑圧処理を
極めて高精度に行うことが可能となる。Particularly, according to the present invention, the target sound / noise is determined based on both the directionality index and the speaker direction obtained from the directivity of the beam former filter.
It is possible to remove voices from areas other than the set speaker range, and it is also possible to remove signals with short durations, such as sudden noise, with high accuracy, making it possible to perform noise suppression processing in an actual environment with extremely high accuracy. Become.

[Brief description of drawings]

【図１】本発明を説明するための図であって、本発明の
基本構成例を示す全体構成ブロック図である。FIG. 1 is a diagram for explaining the present invention and is an overall configuration block diagram showing a basic configuration example of the present invention.

【図２】本発明を説明するための図であって、本発明シ
ステムで用いるスペクトルサブトラクション処理制御部
の処理の流れを示すフローチャートである。FIG. 2 is a diagram for explaining the present invention and is a flowchart showing a processing flow of a spectrum subtraction processing control unit used in the system of the present invention.

【図３】本発明を説明するための図であって、本発明シ
ステムで用いる話者追尾マイクロホンアレイ１０の構成
例の全体構成を示すブロック図である。FIG. 3 is a diagram for explaining the present invention and is a block diagram showing an overall configuration of a speaker tracking microphone array 10 used in the system of the present invention.

【図４】本発明を説明するための図であって、本発明シ
ステムで用いるビームフォーマの構成例と動作例を説明
する図である。FIG. 4 is a diagram for explaining the present invention, and is a diagram for explaining a configuration example and an operation example of a beam former used in the system of the present invention.

【図５】本発明を説明するための図であって、本発明シ
ステムで用いる方向推定部の作用を説明するためのフロ
ーチャートである。FIG. 5 is a diagram for explaining the present invention, and is a flowchart for explaining the operation of the direction estimating unit used in the system of the present invention.

【図６】本発明を説明するための図であって、本発明シ
ステムの作用を説明するためのフローチャートである。FIG. 6 is a diagram for explaining the present invention, and is a flowchart for explaining the operation of the system of the present invention.

【図７】本発明を説明するための図であって、本発明シ
ステムの別の構成例を示す全体構成ブロック図である。FIG. 7 is a diagram for explaining the present invention and is an overall configuration block diagram showing another configuration example of the system of the present invention.

【図８】本発明を説明するための図であって、本発明シ
ステムの構成例２におけるビームフォーマの追尾範囲を
説明するための図である。FIG. 8 is a diagram for explaining the present invention and is a diagram for explaining the tracking range of the beam former in the configuration example 2 of the system of the present invention.

【図９】本発明を説明するための図であって、本発明シ
ステムの構成例２におけるシステムの作用を説明するた
めのフローチャートである。FIG. 9 is a diagram for explaining the present invention and is a flowchart for explaining the operation of the system in Configuration Example 2 of the system of the present invention.

【図１０】本発明を説明するための図であって、本発明
システムにて用いるスペクトルサブトラクション（Ｓ
Ｓ）処理部３０の構成例を示すブロック図である。FIG. 10 is a diagram for explaining the present invention, which is a spectrum subtraction (S
5 is a block diagram showing a configuration example of S) processing unit 30. FIG.

【図１１】本発明を説明するための図であって、本発明
システムにて用いるスペクトルサブトラクション（Ｓ
Ｓ）処理部３０の作用を説明するためのフローチャート
である。FIG. 11 is a diagram for explaining the present invention, which is a spectrum subtraction (S
S) A flowchart for explaining the operation of the processing unit 30.

【図１２】本発明を説明するための図であって、本発明
システムにて用いる方向性検出部の構成例を示すブロッ
ク図である。FIG. 12 is a diagram for explaining the present invention and is a block diagram showing a configuration example of a directionality detection unit used in the system of the present invention.

【図１３】本発明を説明するための図であって、本発明
システムにて用いる方向性指標検出部の処理の流れを示
す図である。FIG. 13 is a diagram for explaining the present invention and is a diagram showing a flow of processing of a directional index detection unit used in the system of the present invention.

【図１４】本発明を説明するための図であって、本発明
システムにて用いる方向性検出部の別の構成例を示すブ
ロック図である。FIG. 14 is a diagram for explaining the present invention and is a block diagram showing another configuration example of the directionality detection unit used in the system of the present invention.

【図１５】本発明を説明するための図であって、本発明
システムにて用いる方向性指標検出部の処理の流れを示
す図である。FIG. 15 is a diagram for explaining the present invention and is a diagram showing a flow of processing of a directional index detection unit used in the system of the present invention.

【図１６】本発明を説明するための図であって、本発明
システムの別の具体的構成例を示すブロック図である。FIG. 16 is a diagram for explaining the present invention and is a block diagram showing another specific configuration example of the system of the present invention.

【図１７】本発明を説明するための図であって、本発明
システムにて用いるマイクロホンアレイの全体処理の流
れを示す図である。FIG. 17 is a diagram for explaining the present invention and is a diagram showing the flow of the overall processing of the microphone array used in the system of the present invention.

【図１８】本発明を説明するための図であって、本発明
システムの別の具体的構成例を示すブロック図である。FIG. 18 is a diagram for explaining the present invention and is a block diagram showing another specific configuration example of the system of the present invention.

【図１９】本発明を説明するための図であって、本発明
システムにて用いるスペクトルサブトラクション（Ｓ
Ｓ）処理部３０の別の構成例を示すブロック図である。FIG. 19 is a diagram for explaining the present invention, which is a spectrum subtraction (S
S) A block diagram showing another configuration example of the processing unit 30.

【図２０】本発明を説明するための図であって、本発明
システムにて用いる図１９の構成のスペクトルサブトラ
クション（ＳＳ）処理部３０の作用を説明するためのフ
ローチャートである。20 is a diagram for explaining the present invention and is a flowchart for explaining the operation of the spectrum subtraction (SS) processing unit 30 of the configuration of FIG. 19 used in the system of the present invention.

[Explanation of symbols]

１０…話者追尾マイクロホンアレイ１１…音声入力部１２…周波数解析部１３…第１のビームフォーマ１４…第１の入力方向修正部１５…第２の入力方向修正部１６…第２のビームフォーマ１７…雑音方向推定部１８…第１の音声方向推定部（目的音方向推定部）２１…第３の入力方向修正部２２…第３のビームフォーマ２３…第２の音声方向推定部２４…有効雑音決定部３０…スペクトルサブトラクション（ＳＳ）処理部３１…音声帯域パワー計算部３２…雑音帯域パワー計算部３３…帯域重み計算部３４…スペクトル減算部３５…入力信号帯域パワー計算部４０…方向検出部５０…スペクトルサブトラクション制御部。 10. Speaker tracking microphone array 11 ... Voice input section 12 ... Frequency analysis unit 13 ... First beam former 14 ... First input direction correction unit 15 ... Second input direction correction unit 16 ... Second beam former 17 ... Noise direction estimation unit 18 ... First voice direction estimation unit (target sound direction estimation unit) 21 ... Third input direction correction unit 22 ... Third beam former 23 ... Second voice direction estimating unit 24 ... Effective noise determination unit 30 ... Spectral subtraction (SS) processing unit 31 ... Voice band power calculator 32 ... Noise band power calculator 33 ... Band weight calculation unit 34 ... Spectrum subtraction unit 35 ... Input signal band power calculator 40 ... Direction detection unit 50 ... Spectral subtraction control unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平11−52977（ＪＰ，Ａ) 特開平５−52923（ＪＰ，Ａ) 特開2000−47699（ＪＰ，Ａ) 特開平４−245300（ＪＰ，Ａ) 特開平10−207490（ＪＰ，Ａ) 特開平11−41687（ＪＰ，Ａ) 永田仁史，話者追尾型２ｃｈビームフォーマによる雑音抑圧に関する検討，電子情報通信学会技術研究報告［音声］, 1997年７月18日，ＳＰ97−36，ｐ．17 −22 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/02 G01S 3/802 G01S 7/523 ＪＩＣＳＴファイル（ＪＯＩＳ)─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-11-52977 (JP, A) JP-A-5-52923 (JP, A) JP-A-2000-47699 (JP, A) JP-A-4-245300 (JP, A) JP-A-10-207490 (JP, A) JP-A-11-41687 (JP, A) Hitoshi Nagata, Study on noise suppression by speaker tracking 2ch beamformer, IEICE technical research Report [Voice], July 18, 1997, SP97-36, p. 17-22 (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 21/02 G01S 3/802 G01S 7/523 JISC file (JOIS)

Claims

(57) [Claims]

1. A voice input means for receiving a voice of a speaker at two or more different positions and outputting as a voice signal respectively, and performing frequency analysis for each channel of the voice signal corresponding to the sound receiving position. Frequency analysis means for outputting the frequency component of each channel, respectively, using the frequency components of the plurality of channels output by the frequency analysis means to suppress the incoming noise other than the target voice by adaptive filter processing, the target voice component A first beamformer processing means for outputting a signal; and a second beamformer processing means for suppressing a target voice by adaptive filter processing using frequency components of a plurality of channels output by the frequency analysis means and outputting a noise component signal. Beam former processing means, and a noise direction estimating means for estimating the noise direction from the filter coefficient calculated by the first beam former processing means. A target sound direction estimating means for estimating a target sound direction from a filter coefficient calculated by the second beamformer processing means, and an arrival direction of a target sound as an input target in the first beamformer processing means. First input direction correction means for sequentially correcting a certain first input direction based on the target sound direction estimated by the target sound direction estimation means; and noise to be input by the second beamformer processing means. Second input direction correcting means for sequentially correcting the second input direction, which is the arrival direction of the signal, based on the noise direction estimated by the noise direction estimating means, and the output of the first beamformer processing means and the first input direction correcting means. Spectrum subtraction means for performing spectrum subtraction processing which is nonlinear noise suppression processing based on the output of the second beamformer processing means; Directionality detecting means for calculating a directionality index based on the time difference and the amplitude difference of the incoming sound from the frequency component output from the analyzing means, and the directionality index and the target sound output from the target sound direction estimating means. And a spectrum subtraction control means for controlling the spectrum subtraction processing of the spectrum subtraction means on the basis of the direction and the direction, and outputs a target voice frequency component.

2. A plurality of microphones each having directivity are arranged such that axes of the directivity directions are inclined to each other,
Voice input means for obtaining the voice of the speaker as a voice signal, frequency analysis means for frequency-analyzing the voice signal for each of the microphones of the voice input means and outputting a frequency component for each channel, and output of the frequency analysis means The first beamformer processing means for performing the suppression processing of the incoming noise other than the target voice by the adaptive filter processing using the frequency components of the plurality of channels, and outputting the signal of the target voice component, and the output of the frequency analysis means. Second beamformer processing means for suppressing target voice by adaptive filter processing using frequency components of a plurality of channels and outputting a noise component signal; and a filter calculated by the first beamformer processing means. Noise direction estimation means for estimating the noise direction from the coefficient, and a filter calculated by the second beamformer processing means A target sound direction estimating means for estimating a target sound direction from a coefficient, and a first input direction which is a direction of arrival of a target sound as an input target in the first beamformer processing means is estimated by the target sound direction estimating means. A first input direction correcting means for sequentially correcting based on the generated target sound direction; and a second input direction, which is a direction of arrival of noise to be input in the second beamformer processing means, for estimating the noise direction. Second input direction correction means for sequentially correcting the noise direction estimated by the means, and nonlinear noise suppression processing based on the output of the first beamformer processing means and the output of the second beamformer processing means. The spectral subtraction means for performing the spectral subtraction process and the two channel components of the frequency components of each channel output from the frequency analysis means. Based on the phase difference between the partial frequency components,
Phase difference equivalent amount calculating means for calculating a phase difference equivalent amount which is an amount corresponding to a time difference between channel signals, inter-channel power ratio calculating means for calculating a power ratio between channels from frequency components of the two channels, Directionality detecting means including directionality calculating means for calculating a directionality index based on the phase equivalent amount calculated by the phase difference equivalent amount calculating means and the interchannel power ratio calculated by the interchannel power ratio calculating means; A target sound frequency, which comprises a spectrum subtraction control means for controlling the processing of the spectrum subtraction means based on the directionality index obtained by the directionality detection means and the target sound direction output from the target sound direction estimation means. A noise component suppressing device characterized by outputting a component.

3. A plurality of microphones, each of which has directivity, are arranged such that axes of the directivity directions are inclined to each other,
Voice input means for obtaining the voice of the speaker as a voice signal, frequency analysis means for frequency-analyzing the voice signal for each of the microphones of the voice input means and outputting a frequency component for each channel, and output of the frequency analysis means The first beamformer processing means for performing the suppression processing of the incoming noise other than the target voice by the adaptive filter processing using the frequency components of the plurality of channels, and outputting the signal of the target voice component, and the output of the frequency analysis means. Second beamformer processing means for suppressing target voice by adaptive filter processing using frequency components of a plurality of channels and outputting a noise component signal; and a filter calculated by the first beamformer processing means. Noise direction estimation means for estimating the noise direction from the coefficient, and a filter calculated by the second beamformer processing means A target sound direction estimating means for estimating a target sound direction from a coefficient, and a first input direction which is a direction of arrival of a target sound as an input target in the first beamformer processing means is estimated by the target sound direction estimating means. A first input direction correcting means for sequentially correcting based on the generated target sound direction; and a second input direction, which is a direction of arrival of noise to be input in the second beamformer processing means, for estimating the noise direction. Second input direction correction means for sequentially correcting the noise direction estimated by the means, and nonlinear noise suppression processing based on the output of the first beamformer processing means and the output of the second beamformer processing means. A spectral subtraction means for performing a spectral subtraction process, and two of the frequency components of the plurality of frequency components, The spectrum normalizing means for normalizing the magnitude of the frequency component and the power of the difference component between the two channel frequency components normalized by the spectrum normalizing means are calculated, and the power of the difference spectrum is used as a directional index. Directionality detecting means including inter-channel spectral difference calculating means for obtaining, and the spectral subtraction means based on the directionality index obtained by the directionality detecting means and the target sound direction output from the target sound direction estimating means. A noise subtraction apparatus comprising: a spectral subtraction control unit that controls processing, and outputting a target voice frequency component.

4. A voice input means for receiving a voice of a speaker at two or more different positions and outputting each as a voice signal, and performing frequency analysis for each channel of the voice signal corresponding to the sound receiving position. Frequency analysis means for outputting frequency components for each channel, and adaptive filters using filter coefficients calculated so that the sensitivity outside the desired direction is low for the frequency components of the plurality of channels obtained by the frequency analysis means. A first noise obtaining process for obtaining a target voice component by performing an incoming noise suppressing process for suppressing a voice other than the voice from the speaker direction by performing a process.
The beamformer processing means and the frequency component of the plurality of channels obtained by the frequency analysis means are subjected to adaptive filter processing using filter coefficients calculated so that sensitivity outside the desired direction becomes low. Second beamformer processing means for suppressing the voice from the other direction and obtaining the first noise component, and frequency components of the plurality of channels obtained by the frequency analyzing means, so that sensitivity outside the desired direction becomes low. Third beamformer processing means for suppressing the voice from the speaker direction to obtain a second noise component by performing adaptive filter processing using the filter coefficient calculated in step 1; and the first beamformer processing. Noise direction estimation means for estimating the noise direction from the filter coefficient calculated by the means, and a filter calculated by the second beamformer processing means. A first target sound direction estimating means for estimating a first target sound direction from a filter coefficient; and a second purpose for estimating a second target sound direction from a filter coefficient calculated by the third beam former processing means. Sound direction estimating means and a first input direction estimated by the first target sound direction estimating means, the first input direction being the arrival direction of the target sound to be input in the first beamformer processing means. A first input direction correcting means for sequentially correcting the sound direction and / or the second target sound direction estimated by the second target sound direction estimating means; and the noise direction correcting means for estimating A second arrival direction of noise to be input in the second beamformer processing means when the generated noise direction is within a predetermined first range.
A second input direction correcting means for sequentially correcting the input direction of the light source based on the noise direction, and the third beam when the noise direction estimated by the noise direction correcting means is within a predetermined second range. Third, which is the arrival direction of noise to be input in the former processing means
Third input direction correcting means for sequentially correcting the input direction of the noise direction based on the noise direction, and whether the noise direction estimated by the noise direction estimating means has come from a predetermined first range or a predetermined second range. From one of the first and second output noises is determined as a true noise output and one of the noises is output, and at the same time, the first voice direction estimating means and the second voice direction estimating means. Effective noise determination means for determining which estimation result of the speech direction estimation means is valid and outputting one of the speech direction estimation results to the first input direction correction means; and the first beamformer processing. Spectrum subtraction means for performing spectrum subtraction processing, which is nonlinear noise suppression processing, based on the output of the means and the output of the second beamformer processing means, and output from the frequency analysis means Based on the directionality index and the target sound direction output from the target sound direction estimation means, the directionality detection means calculating a directionality index based on the time difference and the amplitude difference of the incoming sound from the generated frequency component. And a spectral subtraction control means for controlling the spectral subtraction processing of the spectral subtraction means, and outputting a target audio frequency component.

5. A plurality of microphones each having directivity are arranged with their axes of the directivity direction inclined to each other,
A voice input means for obtaining the voice of the speaker as a voice signal, a frequency analysis means for frequency-analyzing the voice signal for each of the microphones of the voice input means, and outputting a frequency component for each channel, and the frequency analysis means. For the obtained frequency components of the plurality of channels, an incoming noise that suppresses voices other than the voice from the speaker direction by performing an adaptive filter process using a filter coefficient calculated so that the sensitivity outside the desired direction becomes low. First to obtain target speech component by performing suppression processing
The beamformer processing means and the frequency component of the plurality of channels obtained by the frequency analysis means are subjected to adaptive filter processing using filter coefficients calculated so that sensitivity outside the desired direction becomes low. Second beamformer processing means for suppressing the voice from the other direction and obtaining the first noise component, and frequency components of the plurality of channels obtained by the frequency analyzing means, so that sensitivity outside the desired direction becomes low. Third beamformer processing means for suppressing the voice from the speaker direction to obtain a second noise component by performing adaptive filter processing using the filter coefficient calculated in step 1; and the first beamformer processing. Noise direction estimation means for estimating the noise direction from the filter coefficient calculated by the means, and a filter calculated by the second beamformer processing means. A first target sound direction estimating means for estimating a first target sound direction from a filter coefficient; and a second purpose for estimating a second target sound direction from a filter coefficient calculated by the third beam former processing means. Sound direction estimating means and a first input direction estimated by the first target sound direction estimating means, the first input direction being the arrival direction of the target sound to be input in the first beamformer processing means. A first input direction correcting means for sequentially correcting the sound direction and / or the second target sound direction estimated by the second target sound direction estimating means; and the noise direction correcting means for estimating A second arrival direction of noise to be input in the second beamformer processing means when the generated noise direction is within a predetermined first range.
A second input direction correcting means for sequentially correcting the input direction of the light source based on the noise direction, and the third beam when the noise direction estimated by the noise direction correcting means is within a predetermined second range. Third, which is the arrival direction of noise to be input in the former processing means
Third input direction correcting means for sequentially correcting the input direction of the noise direction based on the noise direction, and whether the noise direction estimated by the noise direction estimating means has come from a predetermined first range or a predetermined second range. From one of the first and second output noises is determined as a true noise output and one of the noises is output, and at the same time, the first voice direction estimating means and the second voice direction estimating means. Effective noise determination means for determining which estimation result of the speech direction estimation means is valid and outputting one of the speech direction estimation results to the first input direction correction means; and the first beamformer processing. Spectrum subtraction means for performing spectrum subtraction processing, which is nonlinear noise suppression processing, based on the output of the means and the output of the second beamformer processing means, and the output of the frequency analysis means. Among the frequency components of each channel, based on the phase difference of the frequency components of the two channels,
Phase difference equivalent amount calculating means for calculating a phase difference equivalent amount which is an amount corresponding to a time difference between channel signals, inter-channel power ratio calculating means for calculating a power ratio between channels from frequency components of the two channels, Directionality detecting means including directionality calculating means for calculating a directionality index based on the phase equivalent amount calculated by the phase difference equivalent amount calculating means and the interchannel power ratio calculated by the interchannel power ratio calculating means; A direction subtraction control means for controlling the spectrum subtraction processing of the spectrum subtraction means based on the directionality index obtained by the directionality detection means and the target sound direction output from the target sound direction estimation means is provided. A noise component suppressing device characterized by outputting a voice frequency component.

6. A plurality of microphones, each of which has directivity, is arranged such that axes of the directivity directions are inclined with respect to each other.
A voice input means for obtaining the voice of the speaker as a voice signal, a frequency analysis means for frequency-analyzing the voice signal for each of the microphones of the voice input means, and outputting a frequency component for each channel, and the frequency analysis means. For the obtained frequency components of the plurality of channels, an incoming noise that suppresses voices other than the voice from the speaker direction by performing an adaptive filter process using a filter coefficient calculated so that the sensitivity outside the desired direction becomes low. First to obtain target speech component by performing suppression processing
The beamformer processing means and the frequency component of the plurality of channels obtained by the frequency analysis means are subjected to adaptive filter processing using filter coefficients calculated so that sensitivity outside the desired direction becomes low. Second beamformer processing means for suppressing the voice from the other direction and obtaining the first noise component, and frequency components of the plurality of channels obtained by the frequency analyzing means, so that sensitivity outside the desired direction becomes low. Third beamformer processing means for suppressing the voice from the speaker direction to obtain a second noise component by performing adaptive filter processing using the filter coefficient calculated in step 1; and the first beamformer processing. Noise direction estimation means for estimating the noise direction from the filter coefficient calculated by the means, and a filter calculated by the second beamformer processing means. A first target sound direction estimating means for estimating a first target sound direction from a filter coefficient; and a second purpose for estimating a second target sound direction from a filter coefficient calculated by the third beam former processing means. Sound direction estimating means and a first input direction estimated by the first target sound direction estimating means, the first input direction being the arrival direction of the target sound to be input in the first beamformer processing means. A first input direction correcting means for sequentially correcting the sound direction and / or the second target sound direction estimated by the second target sound direction estimating means; and the noise direction correcting means for estimating A second arrival direction of noise to be input in the second beamformer processing means when the generated noise direction is within a predetermined first range.
A second input direction correcting means for sequentially correcting the input direction of the light source based on the noise direction, and the third beam when the noise direction estimated by the noise direction correcting means is within a predetermined second range. Third, which is the arrival direction of noise to be input in the former processing means
Third input direction correcting means for sequentially correcting the input direction of the noise direction based on the noise direction, and whether the noise direction estimated by the noise direction estimating means has come from a predetermined first range or a predetermined second range. From one of the first and second output noises is determined as a true noise output and one of the noises is output, and at the same time, the first voice direction estimating means and the second voice direction estimating means. Effective noise determination means for determining which estimation result of the speech direction estimation means is valid and outputting one of the speech direction estimation results to the first input direction correction means; and the first beamformer processing. A spectrum subtraction means for performing a spectrum subtraction processing which is a non-linear noise suppression processing based on the output of the means and the output of the second beamformer processing means; Then, the power of the difference component between the frequency components of the two channels is calculated from the spectrum normalizing means for normalizing the magnitudes of the frequency components and the frequency components of the two channels normalized by the spectrum normalizing means. Directionality detection means including inter-channel spectrum difference calculation means for obtaining the power of the difference spectrum as a directionality index, directionality index obtained by the directionality detection means, and target sound output from the target sound direction estimation means. A spectrum subtraction control means for controlling the spectrum subtraction processing of the spectrum subtraction means based on the direction, and a noise component suppressing apparatus for outputting a target voice frequency component.

7. A voice input means for obtaining a voice of a speaker as a voice signal is constituted by arranging a plurality of microphones each having directivity, with axes of the directivity directions inclined to each other, and the frequency analyzing means 2. The audio signal of each microphone of the input means is frequency-analyzed to output a frequency component for each channel.
7. The noise component suppressing device according to any one of items 1 to 6.

8. The noise component suppressing apparatus according to claim 1, wherein the spectral subtraction unit divides the obtained voice frequency into frequency bands to calculate voice power for each band. A voice band power calculating means, a noise band power calculating means for calculating the noise power for each band by dividing the obtained noise frequency component for each frequency band, the voice band power calculating means and the noise band power calculating Based on the frequency band power of voice and noise obtained from the means and the output of the spectral subtraction control means,
A noise component suppressing device comprising: a band weight calculating means for obtaining a band weight coefficient; and a spectrum subtracting means for suppressing background noise by multiplying the voice signal by the band weight coefficient for each frequency band.

9. The noise component suppressing apparatus according to claim 1, wherein the spectral subtraction unit divides the obtained voice frequency into frequency bands to obtain voice power for each band. A voice band power calculation means for calculating, a noise band power calculation means for calculating the noise power for each band by dividing the obtained noise frequency component for each frequency band, and an input signal obtained from the voice input means. The frequency component of the input signal subjected to frequency analysis is divided for each frequency band, and input band power calculation means for calculating the input power for each band; and voice obtained from the voice band power calculation means and the noise band power calculation means. Noise frequency band power, input power for each band obtained by the input band power calculation means, and the spectral subtraction control A noise component characterized by comprising a band weight calculation means for obtaining a band weight coefficient based on the output of the means, and a modified spectrum subtraction means for suppressing the background noise by applying the band weight coefficient for each frequency band of the voice signal. Suppressor.

10. A step of receiving a voice of a speaker at two or more different positions to obtain respective voice signals, and a frequency analysis is performed for each channel of the voice signal corresponding to the sound receiving position for each channel. The frequency analysis step of outputting the frequency component of and the frequency component of the multiple channels obtained in this frequency analysis step is used to perform the adaptive filter processing to suppress the incoming noise other than the target voice, and to obtain the signal of the target voice component. And a first beamformer processing step for outputting a signal of a noise component by subjecting the target speech to suppression processing by adaptive filter processing using the frequency components of the plurality of channels obtained in the frequency analysis step. 2) beamformer processing step, and estimating the noise direction from the filter coefficient calculated in the first beamformer processing step Noise direction estimation step, a target sound direction estimation step of estimating a target sound direction from the filter coefficient calculated in the second beamformer processing step, and a target sound input target in the first beamformer processing step A first input direction that is a direction of arrival of the input sound direction based on the target sound direction estimated in the target sound direction estimation step, and an input in the second beamformer processing step. A second input direction correction step of sequentially correcting a second input direction, which is the arrival direction of the target noise, based on the noise direction estimated in the noise direction estimation step; and the first beamformer processing step. A spectrum that is a nonlinear noise suppression process based on the output obtained in step 1 and the output obtained in the second beamformer processing step. A spectral subtraction processing step of performing subtraction processing, a directionality detection step of calculating a directionality index based on a time difference and an amplitude difference of the incoming sound from the frequency components obtained in the frequency analysis step, and the directionality index And a spectral subtraction control step of controlling the spectral subtraction processing of the spectral subtraction processing step based on the target sound direction obtained in the target sound direction estimation step, and outputting a target audio frequency component. Characteristic noise component suppression method.

11. A voice input step of receiving a voice of a speaker at two or more different positions and outputting each as a voice signal, and performing a frequency analysis for each channel of the voice signal corresponding to the sound receiving position. A frequency analysis step of outputting frequency components for each channel, and an adaptive filter using filter coefficients calculated so that sensitivity outside the desired direction becomes low for the frequency components of the plurality of channels obtained in this frequency analysis step. A first beamformer processing step for obtaining a target speech component by performing incoming noise suppression processing for suppressing speech other than speech from the speaker direction by performing processing; and the plurality of channels obtained in the frequency analysis step. Adaptive processing using the filter coefficient calculated so that the sensitivity outside the desired direction becomes low for the frequency components of The second beamformer processing step of suppressing the voice from the speaker direction to obtain the first noise component by applying the frequency component of the plurality of channels obtained in the frequency analysis step is outside the desired direction. A third beamformer processing step of suppressing a voice from the speaker direction and obtaining a second noise component by performing adaptive filter processing using a filter coefficient calculated so as to have low sensitivity; A noise direction estimating step of estimating a noise direction from a filter coefficient calculated in the first beamformer processing step; and a first noise direction estimating step of estimating a first target sound direction from the filter coefficient calculated in the second beamformer processing step. estimation target sound direction estimation step, the second target sound direction from the filter coefficients calculated by the third beam former processing means A second target sound direction estimating step, and a first input direction, which is a direction of arrival of a target sound to be input in the first beamformer processing step, are estimated by the first target sound direction estimating means. A first input sound direction and a second input sound direction estimated in the second target sound direction estimation step, and a first input direction correction step for sequentially correcting the input target sound direction, and the noise. When the noise direction estimated in the direction correction step is within the predetermined first range, the second input direction, which is the arrival direction of the noise to be input in the second beamformer processing step, is set to the noise direction. A second input direction correcting step for sequentially correcting the noise direction, and the third beamformer processing step when the noise direction estimated in the noise direction correcting step is within a predetermined second range. In the input direction, the third input direction that is the arrival direction of the noise that is the input direction is sequentially corrected based on the noise direction, and the noise direction estimated in the noise direction estimation step is predetermined. Of the first and second output noises is determined as the true noise output based on whether the noise comes from the first range or the predetermined second range. At the same time as outputting, it is determined which estimation result of the first voice direction estimation step and the second voice direction estimation means is effective, and one of the voice direction estimation results is input to the first input direction correction means. An effective noise determination step to output, and a spectral subtraction that is a nonlinear noise suppression processing based on the outputs of the first beamformer processing step and the second beamformer processing step. A spectral subtraction processing step of performing a directional process, a directionality detection step of calculating a directionality index based on a time difference and an amplitude difference of the incoming sound from the frequency components obtained in the frequency analysis step, and the directionality index, A spectrum subtraction control step of controlling the spectrum subtraction processing in the spectrum subtraction processing step based on the target sound direction output from the target sound direction estimation step, and outputting a target voice frequency component. Noise component suppression method.