JP4298466B2

JP4298466B2 - Sound collection method, apparatus, program, and recording medium

Info

Publication number: JP4298466B2
Application number: JP2003370697A
Authority: JP
Inventors: 和則小林; 陽一羽田; 澄宇阪内; 末廣島内; 賢一古家; 暁江村; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-10-30
Filing date: 2003-10-30
Publication date: 2009-07-22
Anticipated expiration: 2023-10-30
Also published as: JP2005136709A

Abstract

<P>PROBLEM TO BE SOLVED: To realize sound collection with localization feeling of a sound image even when there is a speaker at a location where an opening angle viewed from a microphone is small. <P>SOLUTION: A covariance matrix computing part 104 calculates a covariance matrix from a sound reception signal, and stores it in a covariance matrix storing part 106 for each location of a speaker detected by a speaker location detecting part 105. Then, L(R) channel filter coefficient calculating parts 107<SB>L</SB>and 107<SB>R</SB>calculate L(R) channel filter coefficients from the stored covariance matrix and L(R) channel mixing coefficients under such conditions that their received speaker voice components are mixed by the L(R) channel mixing coefficients corresponding to the respective speaker locations, and set those L(R) channel filter coefficients to L(R) channel filters 102<SB>L1</SB>to 102<SB>LM</SB>and 102<SB>R1</SB>to 102<SB>RM</SB>. The outputs of the L channel filters 102<SB>L1</SB>to 102<SB>LM</SB>are added by an L channel adder 103<SB>L</SB>, and the outputs of the R channel filters 102<SB>R1</SB>to 102<SB>RM</SB>are added by an R channel adder 103<SB>R</SB>to obtain an L channel output signal and an R channel output signal. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、ＴＶ会議や音声会議、電話、遠隔講義などの収音方法および装置に関する。 The present invention relates to a sound collection method and apparatus for TV conferences, audio conferences, telephone calls, remote lectures, and the like.

図１０は従来技術の収音装置の構成図である。従来技術の収音装置は指向性マイクロホン９０１_Lと９０１_Rで構成され、その指向性の主軸は１２０°程度の開き角で配置されている。 FIG. 10 is a block diagram of a conventional sound collecting device. The sound pickup device of the prior art is composed of directional microphones 901 _L and 901 _R , and the main axis of the directivity is arranged with an opening angle of about 120 °.

２個の指向性マイクロホンを異なる方向を向けて配置することにより、話者の位置によってＬチャネルとＲチャネルに収音される音声レベルに差が生じる。これらの出力信号を２つのスピーカから再生することにより、音像の定位感のある再生を行うことができる。 By arranging the two directional microphones in different directions, a difference occurs in the sound level picked up by the L channel and the R channel depending on the position of the speaker. By reproducing these output signals from two speakers, it is possible to reproduce the sound image with a sense of localization.

例えば、図１０の話者ＣはＬチャネルマイクロホン９０１_Lの主軸方向にいるので、収音された話者Ｃの音声レベルはＬチャネルのほうが大きく、再生したときにＬチャネル側のスピーカに音像が定位する。また、ＬチャネルとＲチャネルのマイクロホン９０１_L，９０１_Rの中間にいる話者Ａの音声は、両マイクロホンにほぼ同じレベルで収音されるので、ＬチャネルとＲチャネルのスピーカの中間に音像が定位する。 For example, since the speaker C in FIG. 10 is in the main axis direction of the L channel microphone 901 _L , the voice level of the collected speaker C is higher in the L channel, and a sound image is displayed on the speaker on the L channel side during playback. I'll pan. In addition, since the voice of the speaker A in the middle between the L-channel and R-channel microphones 901 _L and 901 _R is picked up by both microphones at almost the same level, a sound image is present between the L-channel and R-channel speakers. I'll pan.

このように、従来技術では音像の定位感のあるステレオ収音を行うことができる。
中島平太郎ら著、応用電気音響、コロナ社出版、日本音響学会編、ｐｐ．２６２−２６８、昭和５４年 In this way, the conventional technique can perform stereo sound collection with a sense of localization of the sound image.
Heitaro Nakajima et al., Applied Electroacoustics, Corona Publishing, edited by the Acoustical Society of Japan, pp. 262-268, 1979

上述した従来技術の収音方法では、以下に示す問題がある。 The above-described conventional sound collection method has the following problems.

音声の距離減衰の影響により、マイクロホンから距離が離れている話者の音声レベルが小さく聞き取りづらい。もし、マイクの感度を上昇させ距離の離れている話者に対して適正なレベルとしたとしても、マイクロホンに近い話者の音声が過大なレベルとなる。 Due to the effect of voice attenuation, the voice level of speakers far away from the microphone is small and difficult to hear. Even if the sensitivity of the microphone is increased to an appropriate level for a speaker who is far away, the voice of the speaker close to the microphone becomes an excessive level.

図１０の話者Ａと話者Ｂのようにマイクロホンから見た開き角が小さい場合、話者Ａと話者Ｂのマイクロホン間のレベル差はほぼ同じとなり、音像の定位感が得られなくなる。 When the opening angle viewed from the microphone is small as in the case of speaker A and speaker B in FIG. 10, the level difference between the microphones of speaker A and speaker B is almost the same, and the sense of localization of the sound image cannot be obtained.

雑音や、スピーカから再生される受話信号がマイクロホンに収音され、聞き取りづらい音声となる。 Noise and the received signal reproduced from the speaker are collected by the microphone, and the sound is difficult to hear.

複数人でテーブルを囲むＴＶ会議や音声会議に従来技術を適用した場合、上記の問題を生じ、高品質な音声で音像の定位感のある通信を行うことが難しい。 When the conventional technology is applied to a TV conference or an audio conference in which a table is surrounded by a plurality of people, the above-described problem occurs, and it is difficult to perform communication with a sense of localization of sound images with high-quality audio.

本発明の目的は、マイクロホンから距離が離れている話者の音声を適正レベルにすることで聞き取りやすい音量での通話を実現し、マイクロホンから見た開き角が小さい位置に話者がいる場合でもＬＲチャネル間のレベル差を所望のレベル差とすることで音像の定位感のある収音を実現し、雑音と受話信号を抑圧した高品質な送話音声を得る収音方法、装置、およびプログラムを提供することである。 The object of the present invention is to realize a call with a volume that is easy to hear by setting the voice of a speaker far away from the microphone to an appropriate level, even when the speaker is at a position where the opening angle seen from the microphone is small. Sound collection method, apparatus, and program for realizing high-quality transmitted sound with suppressed noise and received signal by realizing sound collection with a sense of localization of sound image by setting a level difference between LR channels as a desired level difference Is to provide.

本発明の第１の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するＬチャネルミキシング係数をあらかじめ設定するＬチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｌチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Ｌチャネルミキシング係数からＬチャネルフィルタ係数を算出するＬチャネルフィルタ係数計算段階と、
各話者位置に対応するＲチャネルミキシング係数をあらかじめ設定するＲチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｒチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Ｒチャネルミキシング係数からＲチャネルフィルタ係数を算出するＲチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Ｌチャネルフィルタ係数で各々フィルタリングするＬチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Ｒチャネルフィルタ係数で各々フィルタリングするＲチャネルフィルタ段階と、
前記Ｌチャネルフィルタ段階の出力信号を加算するＬチャネル加算段階と、
前記Ｒチャネルフィルタ段階の出力信号を加算するＲチャネル加算段階を有する。 According to the first aspect of the present invention, the sound collection method comprises:
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the L channel mixing coefficient under the condition that each speaker voice component received by each of the plurality of sound pickup means is mixed by the L channel mixing coefficient corresponding to each speaker position. Calculating the L channel filter coefficient from the L channel filter coefficient calculating stage;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the R channel mixing coefficient under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed by the R channel mixing coefficient corresponding to each speaker position. Calculating an R channel filter coefficient from the R channel filter coefficient calculating stage;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

複数マイクロホンからの受音信号から話者位置を検出し、共分散行列を求め、話者位置ごとに所望のＬＲチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。 Speaker position is detected from sound signals received from a plurality of microphones, a covariance matrix is obtained, a filter coefficient that gives a desired level difference between LR channels is obtained for each speaker position, and the microphone sound reception signal is obtained using these filter coefficients. Is filtered for each channel, a stereo output signal having a desired level difference for each speaker position can be obtained.

本発明の第２の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号から話者位置と雑音区間を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するＬチャネルミキシング係数をあらかじめ設定するＬチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｌチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｌチャネルミキシング係数からＬチャネルフィルタ係数を算出するＬチャネルフィルタ係数計算段階と、
各話者位置に対応するＲチャネルミキシング係数をあらかじめ設定するＲチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｒチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｒチャネルミキシング係数からＲチャネルフィルタ係数を算出するＲチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Ｌチャネルフィルタ係数で各々フィルタリングするＬチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Ｒチャネルフィルタ係数で各々フィルタリングするＲチャネルフィルタ段階と、
前記Ｌチャネルフィルタ段階の出力信号を加算するＬチャネル加算段階と、
前記Ｒチャネルフィルタ段階の出力信号を加算するＲチャネル加算段階を有する。 According to the second aspect of the present invention, the sound collection method comprises:
A speaker position detection stage for detecting a speaker position and a noise section from a received signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each noise interval and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position and the noise component is suppressed. And an L channel filter coefficient calculation step of calculating an L channel filter coefficient from the L channel mixing coefficient,
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, and the noise component is suppressed. And an R channel filter coefficient calculating step of calculating an R channel filter coefficient from the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

話者位置および雑音区間を推定し、各話者位置に対する共分散行列と雑音に対する共分散行列を保存しておき、これらを用いてＬチャネルとＲチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Ｌチャネル出力信号とＲチャネル出力信号を得る。これにより、雑音の抑圧と、各話者からの音声信号がＬチャネルとＲチャネルでレベル差を持った良好な音像の定位が実現する。 Estimate the speaker position and noise interval, store the covariance matrix for each speaker position and the covariance matrix for noise, and use these to determine the L channel and R channel filter coefficients, with these filter coefficients: Each of the microphone sound reception signals is filtered and added to obtain an L channel output signal and an R channel output signal. This realizes noise suppression and good sound image localization in which the audio signal from each speaker has a level difference between the L channel and the R channel.

本発明の第３の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号と、通信相手からのＬチャネル受話信号とＲチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出段階と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するＬチャネルミキシング係数をあらかじめ設定するＬチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｌチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｌチャネルミキシング係数からＬチャネルフィルタ係数を算出するＬチャネルフィルタ係数計算段階と、
各話者位置に対応するＲチャネルミキシング係数をあらかじめ設定するＲチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｒチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｒチャネルミキシング係数からＲチャネルフィルタ係数を算出するＲチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Ｌチャネルフィルタ係数で各々フィルタリングするＬチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Ｒチャネルフィルタ係数で各々フィルタリングするＲチャネルフィルタ段階と、
前記Ｌチャネルフィルタ段階の出力信号を加算するＬチャネル加算段階と、
前記Ｒチャネルフィルタ段階の出力信号を加算するＲチャネル加算段階を有する。 According to the third aspect of the present invention, the sound collection method comprises:
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

受話信号とマイクロホン受音信号から受話区間、送話区間、雑音区間を検出し、送話区間であった場合に話者位置を推定し、各話者位置に対する共分散行列と雑音の共分散行列とエコーの共分散行列を保存しておき、これらを用いてＬチャネルとＲチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Ｌチャネル出力信号とＲチャネル出力信号を得る。これにより、雑音とエコーが抑圧され、各話者からの音声信号がＬチャネルとＲチャネルでレベル差を持ち、高品質な音声での通話と良好な音像の定位が実現する。 Detecting the reception interval, transmission interval, and noise interval from the reception signal and microphone reception signal, estimating the speaker position if it is the transmission interval, and covariance matrix and noise covariance matrix for each speaker position And the echo covariance matrix are stored, and the L channel and R channel filter coefficients are obtained using them, and the microphone sound reception signal is filtered and added by these filter coefficients, respectively, and the L channel output signal and the R channel are added. Get the output signal. As a result, noise and echo are suppressed, and the voice signal from each speaker has a level difference between the L channel and the R channel, thereby realizing a high quality voice call and good sound image localization.

本発明の第４の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号と、通信相手からのＬチャネル受話信号とＲチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出段階と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号とＬチャネル受話信号とＲチャネル受話信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するＬチャネルミキシング係数をあらかじめ設定するＬチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｌチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｌチャネルミキシング係数からＬチャネルフィルタ係数を算出するＬチャネルフィルタ係数計算段階と、
各話者位置に対応するＲチャネルミキシング係数をあらかじめ設定するＲチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Ｒチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｒチャネルミキシング係数からＲチャネルフィルタ係数を算出するＲチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号とＬチャネル受話信号とＲチャネル受話信号を、前記Ｌチャネルフィルタ係数で各々フィルタリングするＬチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号とＬチャネル受話信号とＲチャネル受話信号を、前記Ｒチャネルフィルタ係数で各々フィルタリングするＲチャネルフィルタ段階と、
前記Ｌチャネルフィルタ段階の出力信号を加算するＬチャネル加算段階と、
前記Ｒチャネルフィルタ段階の出力信号を加算するＲチャネル加算段階を有する。 According to the fourth aspect of the present invention, the sound collection method comprises:
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculating step of calculating a covariance matrix from the received sound signal received by each of the plurality of sound collecting means, the L channel received signal, and the R channel received signal;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the L channel filter coefficient;
An R channel filter step of filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のＬＲチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。第３の態様の収音方法に受話信号をフィルタリングすることを追加したことにより、第３の態様の収音方法よりも高いエコー抑圧が実現する。 Detects the reception interval, noise interval, and transmission interval from the received signals of multiple microphones, detects the speaker position in the transmission interval, obtains a covariance matrix, and obtains desired echo suppression, noise suppression, and speaker position for each speaker position. Filter coefficients that give the level difference between LR channels are obtained, and the microphone sound reception signal is filtered for each channel using those filter coefficients, so that echo and noise are suppressed, and a desired level difference is provided for each speaker position. A stereo output signal can be obtained. By adding filtering of the received signal to the sound collection method of the third aspect, echo suppression higher than that of the sound collection method of the third aspect is realized.

本発明の第１の実施態様によれば、収音方法は、
前記記憶された各話者の共分散行列から各話者の音声レベルを推定する話者音声レベル推定段階と、
前記各話者の音声レベルから、各話者音声が適正レベルで出力されるための各話者に対するゲインを各々算出するゲイン算出部とをさらに有し、
前記Ｌチャネルフィルタ係数計算段階は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｌチャネルミキシング係数からＬチャネルフィルタ係数を算出し、
前記Ｒチャネルフィルタ係数計算段階は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Ｒチャネルミキシング係数からＲチャネルフィルタ係数を算出する。 According to the first embodiment of the present invention, the sound collection method comprises:
A speaker voice level estimating step for estimating a voice level of each speaker from the stored covariance matrix of each speaker;
A gain calculating unit for calculating a gain for each speaker for outputting each speaker's voice at an appropriate level from the voice level of each speaker;
In the L channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed. From the stored covariance matrix and the L channel mixing coefficient, L channel filter coefficient is calculated,
In the R channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed, from the stored covariance matrix and the R channel mixing coefficient. R channel filter coefficients are calculated.

複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のＬＲチャネル間のレベル差を与え、各話者から発せられた音を適正レベルで収音するフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、各話者から発せられた音を適正レベルで収音し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。 Detects the reception interval, noise interval, and transmission interval from the received signals of multiple microphones, detects the speaker position in the transmission interval, obtains a covariance matrix, and obtains desired echo suppression, noise suppression, and speaker position for each speaker position. By providing a level difference between LR channels, obtaining filter coefficients for picking up sound emitted from each speaker at an appropriate level, and filtering the microphone sound reception signal for each channel using those filter coefficients, echo and noise are obtained. , The sound emitted from each speaker is picked up at an appropriate level, and a stereo output signal having a desired level difference for each speaker position can be obtained.

本発明の第２の実施態様によれば、収音方法は、
前記記憶された共分散行列のうち対角成分で最もパワーの大きい成分、または前記記憶された共分散行列の対角成分の加算値の周波数特性を平滑化するゲインを、前記記憶された共分散行列に乗算し、白色化された共分散行列を、前記Ｌチャネルフィルタ係数計算段階と前記Ｒチャネルフィルタ係数計算段階に入力する白色化段階をさらに有する。 According to the second embodiment of the present invention, the sound collection method comprises:
The stored covariance is a gain for smoothing the frequency characteristic of the diagonal component of the stored covariance matrix having the highest power or the added value of the diagonal components of the stored covariance matrix. The method further comprises a whitening step of inputting a whitened covariance matrix multiplied to the matrix to the L channel filter coefficient calculation step and the R channel filter coefficient calculation step.

共分散行列の白色化により、音源の周波数特性に依存しないフィルタを求めることができる。これにより、音源の周波数特性が変化しても、フィルタ係数の変化がなく、本発明の処理による音色の変化を防ぐことができる。 By whitening the covariance matrix, a filter that does not depend on the frequency characteristics of the sound source can be obtained. Thereby, even if the frequency characteristic of the sound source changes, there is no change in the filter coefficient, and the change in timbre due to the processing of the present invention can be prevented.

本発明の第３の実施態様によれば、収音方法は、
前記複数の収音手段の各々で受音された信号および前記受話信号の時間領域信号から周波数領域信号に変換するＦＦＴ段階と、
前記Ｌチャネル加算段階と前記Ｒチャネル加算段階の出力信号を周波数領域信号から時間領域信号に変換するＩＦＦＴ段階をさらに有し、
前記各段階は周波数領域で演算する。 According to the third embodiment of the present invention, the sound collection method comprises:
An FFT stage for converting a signal received by each of the plurality of sound pickup means and a time domain signal of the received signal into a frequency domain signal;
An IFFT stage for converting the output signal of the L channel addition stage and the R channel addition stage from a frequency domain signal to a time domain signal;
Each step is performed in the frequency domain.

これにより、時間領域の演算に比べ低演算量を実現できる。 Thereby, a low calculation amount can be realized as compared with the calculation in the time domain.

本発明の第４の実施態様によれば、収音方法は、
前記ＬおよびＲチャネルフィルタ係数計算段階と前記ＬおよびＲチャネルフィルタ段階と前記ＬおよびＲチャネル加算段階を、３チャネル以上の１〜Ｊチャネルフィルタ係数計算段階と１〜Ｊチャネルフィルタ段階と１〜Ｊチャネル加算段階に置き換えている。 According to the fourth embodiment of the present invention, the sound collection method comprises:
The L and R channel filter coefficient calculation stage, the L and R channel filter stage, and the L and R channel addition stage are divided into three or more channels of 1 to J channel filter coefficient calculation stage, 1 to J channel filter stage, and 1 to J. Replaced with the channel addition stage.

複数マイクロホンで収音した信号および受話信号から、以下の条件を満たす指向性を形成するＬチャネルフィルタ係数とＲチャネルフィルタ係数を求める。（条件１）マイクロホンから距離が離れている話者の音声を適切レベル（聞き取りやすいレベル）にする。（条件２）マイクロホンから見た開き角が小さい位置に話者がいる場合でもＬＲチャネルのレベル差を所望のレベル差（音像の定位感のあるレベル差）とする。（条件３）雑音と受話信号を抑圧する。次に、求められたＬチャネルフィルタ係数とＲチャネルフィルタ係数で複数マイクロホンで収音した信号および受話信号をフィルタリングし、それらの出力をＬチャネル、Ｒチャネルごとに加算する。 An L channel filter coefficient and an R channel filter coefficient that form directivity that satisfies the following conditions are obtained from signals and received signals collected by a plurality of microphones. (Condition 1) The voice of a speaker who is far away from the microphone is set to an appropriate level (a level that is easy to hear). (Condition 2) Even when a speaker is at a position where the opening angle viewed from the microphone is small, the level difference of the LR channel is set to a desired level difference (a level difference with a sense of localization of a sound image). (Condition 3) Noise and received signal are suppressed. Next, the signal collected by the plurality of microphones and the received signal are filtered with the obtained L channel filter coefficient and R channel filter coefficient, and their outputs are added for each L channel and R channel.

これにより、マイクロホンから距離が離れている話者とマイクロホンに近い話者の音声レベルが適切となり、聞き取りやすい音量での通話が実現する。また、マイクロホンから見た開き角が小さい位置に話者がいる場合でもＬＲチャネル間で所望のレベル差となり、音像の定位感のある収音を実現する。さらに、雑音と受話信号を抑圧した高品質な音声での通信が実現する。 As a result, the voice levels of the speaker far from the microphone and the speaker close to the microphone are appropriate, and a call with a volume that is easy to hear is realized. Further, even when a speaker is located at a position where the opening angle viewed from the microphone is small, a desired level difference is obtained between the LR channels, and sound collection with a sense of localization of the sound image is realized. Furthermore, high-quality voice communication with suppressed noise and received signal is realized.

［第１の実施形態］
図１は本発明の第１の実施形態の収音装置のブロック図である。 [First Embodiment]
FIG. 1 is a block diagram of a sound collecting apparatus according to a first embodiment of the present invention.

本実施形態の収音装置は、マイクロホン１０１₁〜１０１_Mと、Ｌチャネルフィルタ１０２_L1〜１０２_LMと、Ｒチャネルフィルタ１０２_R1〜１０２_RMと、Ｌチャネル加算器１０３_Lと、Ｒチャネル加算器１０３_Rと、話者位置検出部１０５と、共分散行列計算部１０４と、共分散行列記憶部１０６と、Ｌチャネルフィルタ係数計算部１０７_Lと、Ｒチャネルフィルタ係数計算部１０７_Rと、Ｌチャネルミキシング係数設定部２０３_Lと、Ｒチャネルミキシング係数設定部２０３_Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 _{1 to} 101 _M , L channel filters 102 _{L1 to} 102 _LM , R channel filters 102 _{R1 to} 102 _RM , L channel adder 103 _L , and R channel adder 103. _R , speaker position detection unit 105, covariance matrix calculation unit 104, covariance matrix storage unit 106, L channel filter coefficient calculation unit 107 _L , R channel filter coefficient calculation unit 107 _R , and L channel mixing A coefficient setting unit 203 _L and an R channel mixing coefficient setting unit 203 _R are included.

本実施形態は、話者位置を推定し、各話者位置に対する共分散行列を保存しておき、これらを用いてＬチャネルとＲチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Ｌチャネル出力信号とＲチャネル出力信号を得る。これにより、各話者からの音声信号がＬチャネルとＲチャネルで所望のレベル差を持ち、良好な音像の定位が実現する。 In this embodiment, a speaker position is estimated, a covariance matrix for each speaker position is stored, L-channel and R-channel filter coefficients are obtained using these, and microphone reception is performed using these filter coefficients. The signals are filtered and summed to obtain an L channel output signal and an R channel output signal. As a result, the sound signal from each speaker has a desired level difference between the L channel and the R channel, and good sound image localization is realized.

まず、話者位置検出部１０５は、マイクロホン１０１₁〜１０１_Mで受音したマイクロホン受音信号から、話者の位置を検出する。音源位置の推定方法は、例えば相互相関法による方法がある。 First, the speaker position detection unit 105 detects the position of the speaker from the microphone reception signal received by the microphones 101 _{1 to} 101 _M. As a sound source position estimation method, for example, there is a method based on a cross-correlation method.

Ｍ個のマイクロホンがあると想定し、ｉ番目マイクロホン１００_iとｊ番目マイクロホン１００_jで受音された信号より求められる受音信号間遅延時間差を Assuming that there are M microphones, the delay time difference between the received sound signals obtained from the signals received by the i-th microphone 100 _i and the j-th microphone 100 _j is

とする。受音信号間遅延時間差は、信号間の相互相関を求め、その最大ピーク位置から求めることができる。次に、ｍ番目の受音位置を（ｘ_m，ｙ_m，ｚ_m）、推定音源位置を And The delay time difference between the received sound signals can be obtained from the maximum peak position by obtaining the cross-correlation between the signals. Next, the mth sound receiving position is (x _m , y _m , z _m ), and the estimated sound source position is

と表す。これらの位置から求められる推定受音信号間遅延時間差 It expresses. Delay time difference between estimated received sound signals obtained from these positions

は、式（１）で表される。 Is represented by Formula (1).

次に、受音信号間遅延時間差 Next, the delay time difference between the received signals

に音速ｃを乗じ距離に換算したものを、それぞれ受音位置間距離差 Multiplied by the speed of sound c and converted into a distance,

とし、測定値 And measured value

と推定値 And estimated value

の二乗平均誤差 Mean square error of

を求めれば、式（２）となる。 Is obtained, Equation (2) is obtained.

式（２）の二乗平均誤差 Mean square error of equation (2)

を最小化する解を求めれば、受音信号間遅延時間差の測定値と推定値の誤差が最小となる推定音源位置を求めることができる。ただし、式（２）は非線形連立方程式となっており、解析的に解くことは困難であるので、逐次修正を用いた数値解析により求める。 If the solution for minimizing the difference is obtained, the estimated sound source position at which the error between the measured value and the estimated value of the delay time difference between the received sound signals is minimized can be obtained. However, since Equation (2) is a nonlinear simultaneous equation and difficult to solve analytically, it is obtained by numerical analysis using sequential correction.

式（２）を最小化する推定音源位置 Estimated sound source position that minimizes Equation (2)

を求めるには、ある点における勾配を求め、誤差が小さくなる方向に推定音源位置を修正していき、勾配が０となる点を求めればよいので、修正式は式（３）のようになる。 Is obtained by calculating the gradient at a certain point, correcting the estimated sound source position in the direction in which the error becomes smaller, and determining the point at which the gradient becomes 0. The correction formula is as shown in equation (3). .

以上、式（３）を繰返し計算することで、誤差が最小となる推定音源位置を求めることができる。 As described above, the estimated sound source position where the error is minimized can be obtained by repeatedly calculating Expression (3).

次に、共分散行列計算部１０４では、マイクロホン受音信号の共分散を求め、それを行列にする。まず、マイクロホン受音信号の周波数領域変換信号をＸ₁（ω）〜Ｘ_M（ω）とする。これれらの信号の共分散行列 Next, the covariance matrix calculation unit 104 obtains the covariance of the microphone sound reception signal and makes it a matrix. First, let the frequency domain conversion signal of the microphone received signal be X ₁ (ω) to X _M (ω). The covariance matrix of these signals

は、式（９）により算出される。 Is calculated by equation (9).

次に、共分散行列記憶部１０６では、話者位置検出部１０５の検出結果に基づき、共分散行列 Next, in the covariance matrix storage unit 106, the covariance matrix is based on the detection result of the speaker position detection unit 105.

を、各音源位置に対する共分散行列 Is the covariance matrix for each source location

として保存する。 Save as.

Ｌチャネルフィルタ係数算出部１０７_LとＲチャネルフィルタ係数算出部１０７_Rは、各話者から発せられた音を所望のレベル差で収音するためのフィルタ係数を計算する。まず、各マイクロホンに接続されたＬチャネルフィルタ１０２_L1〜１０２_LMとＲチャネルフィルタ１０２_R1〜１０２_RMのフィルタ係数を周波数領域に変換したものを、それぞれＨ_L,1（ω）〜Ｈ_L,M（ω）とＨ_R,1（ω）〜Ｈ_R,M（ω）とする。次に、これらのフィルタ係数を式（１０）と式（１１）により行列としたものを The L channel filter coefficient calculation unit 107 _L and the R channel filter coefficient calculation unit 107 _R calculate filter coefficients for collecting sounds emitted from the speakers with a desired level difference. First, the filter coefficients of the L channel filters 102 _{L1 to} 102 _LM and the R channel filters 102 _{R1 to} 102 _RM connected to the respective microphones are converted to the frequency domain, respectively, H _{L, 1} (ω) to H _{L, M} (Ω) and H _{R, 1} (ω) to H _{R, M} (ω). Next, these filter coefficients are converted into a matrix according to equations (10) and (11).

とする。 And

また、ｉ番目音源が発音している期間のマイクロホン受音信号の周波数領域変換信号をＸ_Si,1（ω）〜Ｘ_Si,M（ω）とする。 Also, let X _{Si, 1} (ω) to X _{Si, M} (ω) be the frequency domain transform signal of the microphone sound reception signal during the period when the i-th sound source is sounding.

ここで、フィルタ係数行列 Where the filter coefficient matrix

に要求される条件は、ｉ番目話者のマイクロホン受音信号Ｘ_Si,1（ω）〜Ｘ_Si,M（ω）をフィルタ係数行列 The required condition is that the microphone sound signal X _{Si, 1} (ω) to X _{Si, M} (ω) of the i-th speaker is a filter coefficient matrix

でそれぞれフィルタリングし、フィルタリング後の信号をチャネルごとに加算したときに、ＬチャネルとＲチャネルの各話者音声信号がｉ番目話者位置（Ｘ_i，Ｙ_i，Ｚ_i）に対応した所望のレベル差Ｐ_Diff（Ｘ_i，Ｙ_i，Ｚ_i）（良好な音像定位を実現するレベル差であり、あらかじめ話者位置ごとに設定される）となっていることである。したがって、各音源の信号をそれぞれフィルタリングおよび加算した信号が所望のレベル差Ｐ_Diff（Ｘ_i，Ｙ_i，Ｚ_i）となるようにＭ行のミキシング係数行列 When the filtered signals are added for each channel, the L-channel and R-channel speaker audio signals correspond to the i-th speaker positions (X _i , Y _i , Z _i ). This is a level difference P _Diff (X _i , Y _i , Z _i ) (a level difference that realizes a good sound image localization and is set in advance for each speaker position). Therefore, the mixing coefficient matrix of M rows so that the signal obtained by filtering and adding the signals of the respective sound sources has a desired level difference P _Diff (X _i , Y _i , Z _i ).

をマイクロホン受音信号にそれぞれ乗じた信号となる式（１２）と式（１３）が理想条件となる。 Equations (12) and (13), which are signals obtained by multiplying the microphone sound reception signal, respectively, are ideal conditions.

所望のレベル差Ｐ_Diff（Ｘ_i，Ｙ_i，Ｚ_i）を実現するミキシング係数行列 Mixing coefficient matrix realizing desired level difference P _Diff (X _i , Y _i , Z _i )

は、Ｌチャネルミキシング係数設定部２０３_LとＲチャネルミキシング係数設定部２０３_Rで話者位置ごとにあらかじめ設定されている。例えば以下に述べるように設定される。図９は本発明において実現する指向性例を示した図である。図９に示すマイクロホンと話者の配置では、マイクロホンから見た開き角が小さい位置に複数の話者が存在する。このような場合、従来技術のステレオマイクロホンでは、ほとんど音像の定位感を得ることはできない。そこで、本発明では音像の定位感を強調する指向性を形成する。例えば、話者位置３に対しては、Ｌチャネルのレベルを大きくし、Ｒチャネルのレベルを小さくして、大きなレベル差が付くようにする。このときミキシング係数行列は It is previously set for each speaker position L channel mixing coefficient setting unit 203 _L and R channel mixing coefficient setting unit 203 _R. For example, it is set as described below. FIG. 9 is a diagram showing an example of directivity realized in the present invention. In the arrangement of the microphone and the speaker shown in FIG. 9, there are a plurality of speakers at positions where the opening angle viewed from the microphone is small. In such a case, the stereo microphone of the prior art can hardly obtain a sense of localization of the sound image. Therefore, the present invention forms directivity that emphasizes the sense of localization of the sound image. For example, for speaker position 3, the L channel level is increased and the R channel level is decreased so that a large level difference is provided. At this time, the mixing coefficient matrix is

のように設定する。このようにすることで、話者位置３からの発話に対して２０ｄＢのレベル差を得ることができる。他の話者位置に対しても同様にミキシング係数行列を設定すれば、図９に示すような指向性が得られ、良好な音像定位を実現できる。 Set as follows. In this way, a level difference of 20 dB can be obtained for the utterance from the speaker position 3. If the mixing coefficient matrix is similarly set for other speaker positions, the directivity as shown in FIG. 9 can be obtained and good sound image localization can be realized.

次に、式（１２）と式（１３）の条件をフィルタ係数行列 Next, the conditions of the equations (12) and (13) are changed to the filter coefficient matrix.

について最小二乗解で解けば、式（１４）と式（１５）となる。ただし、Ｃ_Siは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。 Is solved by least squares solution, Equations (14) and (15) are obtained. However, C _Si is a constant of weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value increases.

以上で、所望のレベル差を得るためのフィルタ係数を求める式（１４）と式（１５）を導出した。 Thus, the equations (14) and (15) for obtaining the filter coefficient for obtaining the desired level difference are derived.

次に、式（１４）と式（１５）により求められた、Ｌチャネルフィルタ係数 Next, the L channel filter coefficient obtained by the equations (14) and (15)

とＲチャネルフィルタ係数 And R channel filter coefficients

は、Ｌチャネルフィルタ１０２_L1〜１０２_LMとＲチャネルフィルタ１０２_R1〜１０２_RMにそれぞれコピーされ、マイクロホン受音信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器１０３_Lと１０３_Rで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 _{L1 to} 102 _LM and the R channel filters 102 _{R1 to} 102 _RM , respectively, and respectively filter the microphone reception signals. The filtered signals are added by adders 103 _L and 103 _R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から話者位置を検出し、共分散行列を求め、話者位置ごとに所望のＬＲチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。 As described above, in this embodiment, the speaker position is detected from the received signals of a plurality of microphones, a covariance matrix is obtained, and a filter coefficient that gives a desired level difference between LR channels is obtained for each speaker position. By filtering the microphone sound reception signal for each channel using these filter coefficients, a stereo output signal having a desired level difference for each speaker position can be obtained.

［第２の実施形態］
本発明の第２の実施形態の収音装置について説明する。本実施形態のブロック図は、第１の実施形態と同じ図１である。本実施形態は、第１の実施形態の収音装置に雑音抑圧機能を加えたものである。 [Second Embodiment]
A sound collection device according to a second embodiment of the present invention will be described. The block diagram of this embodiment is FIG. 1 which is the same as that of the first embodiment. In the present embodiment, a noise suppression function is added to the sound collection device of the first embodiment.

本実施形態は、話者位置および雑音区間を推定し、各話者位置に対する共分散行列と雑音に対する共分散行列を保存しておき、これらを用いてＬチャネルとＲチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Ｌチャネル出力信号とＲチャネル出力信号を得る。これにより、雑音の抑圧と、各話者からの音声信号がＬチャネルとＲチャネルでレベル差を持った、良好な音像の定位が実現する。 In this embodiment, the speaker position and the noise interval are estimated, a covariance matrix for each speaker position and a covariance matrix for noise are stored, and L channel and R channel filter coefficients are obtained using these, and these are calculated. Each of the microphone sound reception signals is filtered and added with the filter coefficient of L, and an L channel output signal and an R channel output signal are obtained. As a result, noise suppression and good sound image localization in which the sound signal from each speaker has a level difference between the L channel and the R channel are realized.

まず、話者位置検出部１０５は、マイクロホン１０１₁〜１０１_Mで受音したマイクロホン受音信号のパワーから雑音区間と発話区間を検出する。例えば、それぞれのマイクロホン受音信号について、短時間平均パワー（０．１〜１ｓ程度）と、長時間平均パワー（１ｓ〜１００ｓ程度）を求め、短時間平均パワーと長時間平均パワーの比が雑音区間の閾値未満の場合に雑音区間と判定し、発話の閾値以上の場合に発話区間と判定する。発話区間と判定された場合は、第１の実施形態と同様にして、話者位置を検出する。 First, the speaker position detection unit 105 detects a noise section and a speech section from the power of a microphone sound reception signal received by the microphones 101 _{1 to} 101 _M. For example, for each microphone reception signal, a short time average power (about 0.1 to 1 s) and a long time average power (about 1 s to 100 s) are obtained, and the ratio of the short time average power and the long time average power is noise. When it is less than the threshold of the section, it is determined as a noise section, and when it is equal to or more than the threshold of utterance, it is determined as an utterance section. If it is determined that the utterance section, the speaker position is detected as in the first embodiment.

次に、共分散行列計算部１０４は、第１の実施形態と同様にして、共分散行列 Next, the covariance matrix calculation unit 104 performs the covariance matrix in the same manner as in the first embodiment.

を算出する。 Is calculated.

共分散行列記憶部１０８では、話者位置検出部１０５の検出結果に基づき、共分散行列 In the covariance matrix storage unit 108, the covariance matrix is based on the detection result of the speaker position detection unit 105.

と雑音の共分散行列 And noise covariance matrix

として保存する。 Save as.

Ｌチャネルフィルタ係数算出部１０７_LとＲチャネルフィルタ係数算出部１０７_Rは、各話者から発せられた音を所望のレベル差で収音し、雑音を抑圧するためのフィルタ係数を計算する。ここで、各マイクロホンに接続されたＬチャネルフィルタ The L channel filter coefficient calculation unit 107 _L and the R channel filter coefficient calculation unit 107 _R pick up sounds emitted from the speakers with a desired level difference, and calculate filter coefficients for suppressing noise. Here, L channel filter connected to each microphone

とＲチャネルフィルタ And R channel filter

に要求される条件は、以下の２つである。１つ目は、第１の実施形態の式（１２）と式（１３）に示したＬチャネルとＲチャネルの各話者音声信号がｉ番目の話者位置（Ｘ_i，Ｙ_i，Ｚ_i）に対応した所望のレベル差Ｐ_Diff（Ｘ_i，Ｙ_i，Ｚ_i）となる条件である。２つ目は、雑音を抑圧する条件であり、マイクロホン受音信号の雑音成分Ｘ_N,1（ω）〜Ｘ_N,M（ω）がフィルタ部に入力された場合に各チャネルの出力が０となる式（１６）と式（１７）である。 The following two conditions are required. The first is that the speaker audio signals of the L channel and the R channel shown in the equations (12) and (13) of the first embodiment are the i-th speaker positions (X _i , Y _i , Z _i). ), A desired level difference P _Diff (X _i , Y _i , Z _i ). The second condition is to suppress noise. When the noise components X _{N, 1} (ω) to X _{N, M} (ω) of the microphone reception signal are input to the filter unit, the output of each channel is 0. (16) and (17).

次に、式（１２）と式（１６）、式（１３）と式（１７）の条件をフィルタ係数行列 Next, the conditions of Expression (12) and Expression (16), Expression (13), and Expression (17) are changed to the filter coefficient matrix.

について最小二乗解で解けば、式（１８）と式（１９）となる。ただし、Ｃ_Siは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。Ｃ_Nは雑音抑圧に対する重みの定数であり、値が大きくなるほど雑音抑圧量が増加する。 Is solved by the least squares solution, equations (18) and (19) are obtained. However, C _Si is a constant of weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value increases. C _N is a constant of the weight for noise suppression, and the noise suppression amount increases as the value increases.

以上で、雑音を抑圧し、所望のレベル差を得るためのフィルタ係数を求める式（１８）と式（１９）を導出した。 Thus, equations (18) and (19) for obtaining filter coefficients for suppressing noise and obtaining a desired level difference are derived.

次に、式（１８）と式（１９）により求められた、Ｌチャネルフィルタ係数 Next, the L channel filter coefficient obtained by Expression (18) and Expression (19)

とＲチャネルフィルタ係数 And R channel filter coefficients

以上示したように、本実施形態では、複数マイクロホンの受音信号から、話者位置と雑音区間を検出し、共分散行列を求め、雑音抑圧と話者位置ごとに所望のＬＲチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。 As described above, in this embodiment, speaker positions and noise intervals are detected from the received signals of a plurality of microphones, a covariance matrix is obtained, and noise suppression and the level between desired LR channels for each speaker position are obtained. By obtaining the filter coefficients that give the difference and filtering the microphone sound reception signal for each channel using those filter coefficients, it is possible to suppress noise and obtain a stereo output signal having a desired level difference for each speaker position. it can.

［第３の実施形態］
図２は本発明の第３の実施形態の収音装置のブロック図である。 [Third Embodiment]
FIG. 2 is a block diagram of a sound collecting apparatus according to the third embodiment of the present invention.

本実施形態の収音装置は、マイクロホン１０１₁〜１０１_Mと、Ｌチャネルフィルタ１０２_L1〜１０２_LMと、Ｒチャネルフィルタ１０２_R1〜１０２_RMと、Ｌチャネル加算器１０３_Lと、Ｒチャネル加算器１０３_Rと、送受話検出部２０１と、話者位置検出部１０５と、共分散行列計算部１０４と、共分散行列記憶部１０６と、Ｌチャネルフィルタ係数計算部１０７_Lと、Ｒチャネルフィルタ係数計算部１０７_Rと、Ｌチャネルミキシング係数設定部２０３_Lと、Ｒチャネルミキシング係数設定部２０３_Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 _{1 to} 101 _M , L channel filters 102 _{L1 to} 102 _LM , R channel filters 102 _{R1 to} 102 _RM , L channel adder 103 _L , and R channel adder 103. _R , transmission / reception detection unit 201, speaker position detection unit 105, covariance matrix calculation unit 104, covariance matrix storage unit 106, L channel filter coefficient calculation unit 107 _L , and R channel filter coefficient calculation unit 107 _R , an L channel mixing coefficient setting unit 203 _L, and an R channel mixing coefficient setting unit 203 _R.

本実施形態は、第１と第２のいずれかの実施形態の収音装置にエコー（マイクロホンにより収音された受話信号）抑圧機能を加えたものである。 In the present embodiment, an echo (received signal collected by a microphone) suppression function is added to the sound collection device of any one of the first and second embodiments.

本実施形態は、受話信号とマイクロホン受音信号から受話区間、送話区間、雑音区間を検出し、送話区間であった場合に話者位置を推定し、各話者位置に対する共分散行列と雑音の共分散行列とエコーの共分散行列を保存しておき、これらを用いてＬチャネルとＲチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Ｌチャネル出力信号とＲチャネル出力信号を得る。これにより、雑音とエコーが抑圧され、各話者からの音声信号がＬチャネルとＲチャネルでレベル差を持ち、高品質な音声での通話と良好な音像の定位が実現する。 In the present embodiment, a reception section, a transmission section, and a noise section are detected from the reception signal and the microphone reception signal, and the speaker position is estimated in the case of the transmission section, and the covariance matrix for each speaker position is The covariance matrix of noise and the covariance matrix of echo are stored, L channel and R channel filter coefficients are obtained using these, and the microphone sound reception signal is filtered and added with these filter coefficients, respectively. An output signal and an R channel output signal are obtained. As a result, noise and echo are suppressed, and the voice signal from each speaker has a level difference between the L channel and the R channel, thereby realizing a high quality voice call and good sound image localization.

まず、送受話検出部２０１は、マイクロホン１０１₁〜１０１_Mで受音したマイクロホン受音信号とＬチャネルおよびＲチャネルの受話信号のパワーから受話区間と発話区間と雑音区間を検出する。例えば、各チャネルの受話信号について、短時間平均パワー（０．１〜１ｓ程度）と、長時間平均パワー（１ｓ〜１００ｓ程度）を求め、短時間平均パワーと長時間平均パワーの比が受話区間の閾値以上だった場合に受話区間と判定する。また、それぞれのマイクロホン受音信号について、短時間平均パワー（０．１〜１ｓ程度）と長時間平均パワー（１ｓ〜１００ｓ程度）を求め、短時間平均パワーと長時間平均パワーの比が雑音区間の閾値未満の場合に雑音区間と判定し、発話の閾値以上の場合に発話区間と判定する。 First, the transmission / reception detecting unit 201 detects a reception section, a speech section, and a noise section from the microphone reception signals received by the microphones 101 _{1 to} 101 _M and the powers of the L channel and R channel reception signals. For example, for a received signal of each channel, a short time average power (about 0.1 to 1 s) and a long time average power (about 1 s to 100 s) are obtained, and a ratio between the short time average power and the long time average power is a reception interval. If it is equal to or greater than the threshold value, it is determined as the reception interval. Further, for each microphone reception signal, a short time average power (about 0.1 to 1 s) and a long time average power (about 1 s to 100 s) are obtained, and the ratio of the short time average power and the long time average power is a noise interval. If it is less than the threshold value, it is determined as a noise interval, and if it is greater than or equal to the utterance threshold value, it is determined as a speech interval.

話者位置検出部１０５は、送受話検出部２０１で発話区間と判定された場合に、第１の実施形態と同様にして、話者位置を検出する。 The speaker position detection unit 105 detects the speaker position in the same manner as in the first embodiment when the transmission / reception detection unit 201 determines that it is an utterance section.

を算出する。 Is calculated.

共分散行列記憶部１０６は、送受話検出部２０１と話者位置検出部１０５の検出結果に基づき、共分散行列 The covariance matrix storage unit 106 is based on the detection results of the transmission / reception detection unit 201 and the speaker position detection unit 105, and uses a covariance matrix.

と雑音の共分散行列 And noise covariance matrix

とエコーの共分散行列 And echo covariance matrix

として保存する。 Save as.

Ｌチャネルフィルタ係数算出部１０７_LとＲチャネルフィルタ係数算出部１０７_Rは、各話者から発せられた音を所望のレベル差で収音し、エコーと雑音を抑圧するためのフィルタ係数を計算する。ここで、各マイクロホンに接続されたＬチャネルフィルタ The L channel filter coefficient calculation unit 107 _L and the R channel filter coefficient calculation unit 107 _R collect sounds emitted from each speaker with a desired level difference, and calculate filter coefficients for suppressing echo and noise. . Here, L channel filter connected to each microphone

とＲチャネルフィルタ And R channel filter

に要求される条件は、以下の３つである。１つ目は、第１の実施形態の式（１２）と式（１３）に示したＬチャネルとＲチャネルの各話者音声信号がｉ番目の話者位置（Ｘ_i，Ｙ_i，Ｚ_i）に対応した所望のレベル差Ｐ_Diff（Ｘ_i，Ｙ_i，Ｚ_i）となる条件である。２つ目は、第２の実施形態の式（１６）と式（１７）で示した雑音を抑圧する条件である。３つ目の条件は、エコー成分を抑圧する条件であり、マイクロホン受音信号のエコー成分Ｘ_E,1（ω）〜Ｘ_E,M（ω）がフィルタ部に入力された場合に各チャネルの出力が０となる式（２０）と式（２１）である。 The following three conditions are required. The first is that the speaker audio signals of the L channel and the R channel shown in the equations (12) and (13) of the first embodiment are the i-th speaker positions (X _i , Y _i , Z _i). ), A desired level difference P _Diff (X _i , Y _i , Z _i ). The second is a condition for suppressing the noise expressed by the equations (16) and (17) of the second embodiment. The third condition is a condition for suppressing the echo component. When the echo components X _{E, 1} (ω) to X _{E, M} (ω) of the microphone reception signal are input to the filter unit, Expressions (20) and (21) in which the output is 0.

次に、式（１２）と式（１６）と式（２０）、式（１３）と式（１７）と式（２１）の条件をフィルタ係数行列 Next, the conditions of Expression (12), Expression (16), Expression (20), Expression (13), Expression (17), and Expression (21) are set as the filter coefficient matrix.

について最小二乗解で解けば、式（２２）と式（２３）となる。ただし、Ｃ_Siは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。Ｃ_Nは雑音抑圧に対する重みの定数であり、値が大きくなるほど雑音抑圧量が増加する。Ｃ_Eはエコー抑圧に対する重みの定数であり、値が大きくなるほどエコー抑圧量が増加する。 Is solved by the least squares solution, equations (22) and (23) are obtained. However, C _Si is a constant of weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value increases. C _N is a constant of the weight for noise suppression, and the noise suppression amount increases as the value increases. _CE is a constant of weight for echo suppression, and the echo suppression amount increases as the value increases.

以上で、雑音とエコーを抑圧し、所望のレベル差を得るためのフィルタ係数を求める式（２２）と式（２３）を導出した。 As described above, the expressions (22) and (23) for obtaining the filter coefficients for suppressing the noise and the echo and obtaining the desired level difference are derived.

次に、式（２２）と式（２３）により求められた、Ｌチャネルフィルタ係数 Next, the L channel filter coefficient obtained by Equation (22) and Equation (23)

とＲチャネルフィルタ係数 And R channel filter coefficients

は、Ｌチャネルフィルタ１０２_L1〜１０２_LMとＲチャネルフィルタ１０２_R1〜１０２_RMにそれぞれコピーされ、マイクロホン受音信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器１０３_Lと１０３_Mで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 _{L1 to} 102 _LM and the R channel filters 102 _{R1 to} 102 _RM , respectively, and respectively filter the microphone reception signals. The filtered signals are added by adders 103 _L and 103 _M for each channel and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のＬＲチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。 As described above, in the present embodiment, the reception interval, the noise interval, and the transmission interval are detected from the reception signals of a plurality of microphones, the speaker position is detected in the transmission interval, a covariance matrix is obtained, and echo suppression is performed. The filter coefficient that gives the desired level difference between the LR channels for each speaker position and the noise suppression is obtained, and the microphone reception signal is filtered for each channel by these filter coefficients to suppress the echo and the noise. A stereo output signal having a desired level difference for each person position can be obtained.

［第４の実施形態］
図３は本発明の第４の実施形態の収音装置のブロック図である。 [Fourth Embodiment]
FIG. 3 is a block diagram of a sound collecting apparatus according to the fourth embodiment of the present invention.

本実施形態の収音装置は、マイクロホン１０１₁〜１０１_Mと、Ｌチャネルフィルタ１０２_L1〜１０２_LM、３０１_LL、３０１_LRと、Ｒチャネルフィルタ１０２_R1〜１０２_RM、３０１_RL、３０１_RRと、Ｌチャネル加算器１０３_Lと、Ｒチャネル加算器１０３_Rと、送受話検出部２０１と、話者位置検出部１０５と、共分散行列計算部１０４と、共分散行列記憶部１０６と、Ｌチャネルフィルタ係数計算部１０７_Lと、Ｒチャネルフィルタ係数計算部１０７_Rと、Ｌチャネルミキシング係数設定部２０３_Lと、Ｒチャネルミキシング係数設定部２０３_Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 _{1 to} 101 _M , L channel filters 102 _{L1 to} 102 _LM , 301 _LL and 301 _LR , R channel filters 102 _{R1 to} 102 _RM , 301 _RL and 301 _RR , L Channel adder 103 _L , R channel adder 103 _R , transmission / reception detector 201, speaker position detector 105, covariance matrix calculator 104, covariance matrix storage 106, L channel filter coefficient The calculation unit 107 _L , the R channel filter coefficient calculation unit 107 _R , the L channel mixing coefficient setting unit 203 _L, and the R channel mixing coefficient setting unit 203 _{R are} configured.

本実施形態は、第３の実施形態の収音装置のＬチャネルフィルタとＲチャネルフィルタに受話信号をフィルタリングするフィルタ３０１_LL、３０１_LR、３０１_RL、３０１_RRを追加した構成であり、第３の実施形態よりもさらに高いエコー抑圧を実現する。 This embodiment is a configuration in which filters 301 _LL , 301 _LR , 301 _RL , and 301 _RR for filtering received signals are added to the L channel filter and the R channel filter of the sound collection device of the third embodiment. Echo suppression higher than that of the embodiment is realized.

まず、送受話検出部２０１は、第３の実施形態と同様に受話区間と発話区間と雑音区間を検出する。 First, the transmission / reception detecting unit 201 detects a reception section, a speech section, and a noise section as in the third embodiment.

次に、共分散行列計算部１０４は、受話信号も含めた共分散行列 Next, the covariance matrix calculation unit 104 includes a covariance matrix including the received signal.

を算出する。マイクロホン受音信号の周波数領域変換信号をＸ₁（ω）〜Ｘ_M（ω）とし、ＬチャネルとＲチャネルの受話信号の周波数領域変換信号をそれぞれＺ_L（ω）とＺ_R（ω）とする。これらの信号の共分散行列 Is calculated. The frequency domain conversion signals of the microphone reception signal are X ₁ (ω) to X _M (ω), and the frequency domain conversion signals of the L channel and R channel reception signals are Z _L (ω) and Z _R (ω), respectively. To do. The covariance matrix of these signals

は、式（２４）により算出される。 Is calculated by equation (24).

次に、共分散行列記憶部１０６は、送受話検出部２０１と話者位置検出部１０５の検出結果に基づき、共分散行列 Next, the covariance matrix storage unit 106 is based on the detection results of the transmission / reception detection unit 201 and the speaker position detection unit 105, and the covariance matrix

と雑音の共分散行列 And noise covariance matrix

とエコーの共分散行列 And echo covariance matrix

として保存する。 Save as.

次に、Ｌチャネルフィルタ係数算出部１０７_LとＲチャネルフィルタ係数算出部１０７_Rは、各話者から発せられた音を所望のレベル差で収音し、エコーと雑音を抑圧するためのフィルタ係数を計算する。まず、各マイクロホンに接続されたＬチャネルフィルタ１０２_L1〜１０２_LMとＲチャネルフィルタ１０２_R1〜１０２_RMのフィルタ係数を周波数領域に変換したものを、それぞれＨ_L,1（ω）〜Ｈ_L,M（ω）とＨ_R,1（ω）〜Ｈ_R,M（ω）とし、ＬおよびＲチャネル受話信号をフィルタリングするためのＬチャネルフィルタをＦ_L,L（ω）とＦ_L,R（ω）とし、ＬおよびＲチャネル受話信号をフィルタリングするためのＲチャネルフィルタをＦ_R,L（ω）とＦ_R,R（ω）とする。次に、これらのフィルタ係数を式（２５）と式（２６）により行列としたものを Next, the L channel filter coefficient calculation unit 107 _L and the R channel filter coefficient calculation unit 107 _R pick up sounds emitted from the speakers with a desired level difference, and filter coefficients for suppressing echo and noise. Calculate First, the filter coefficients of the L channel filters 102 _{L1 to} 102 _LM and the R channel filters 102 _{R1 to} 102 _RM connected to the respective microphones are converted to the frequency domain, respectively, H _{L, 1} (ω) to H _{L, M} (Ω) and H _{R, 1} (ω) to H _{R, M} (ω), and the L channel filters for filtering the L and R channel received signals are F _{L, L} (ω) and F _{L, R} (ω ), And R channel filters for filtering the L and R channel reception signals are F _{R, L} (ω) and F _{R, R} (ω). Next, these filter coefficients are converted into a matrix according to equations (25) and (26).

とする。 And

ここで、Ｌチャネルフィルタ Where L channel filter

とＲチャネルフィルタ And R channel filter

に要求される条件は、以下の３つである。 The following three conditions are required.

１つ目は、ＬチャネルとＲチャネルの出力信号の各話者音声信号がｉ番目の話者位置（Ｘ_i，Ｙ_i，Ｚ_i）に対応した所望のレベル差Ｐ_Diff（Ｘ_i，Ｙ_i，Ｚ_i）となる条件である。第１の実施形態と同様に、この条件は式（２６）と式（２７）で表される。 First, each speaker voice signal of the output signals of the L channel and the R channel has a desired level difference P _Diff (X _i , Y) corresponding to the i-th speaker position (X _i , Y _i , Z _i ). _i , Z _i ). Similar to the first embodiment, this condition is expressed by Expression (26) and Expression (27).

ただし、ミキシング係数行列 However, the mixing coefficient matrix

は、１〜Ｍ番目の要素が第１の実施形態のミキシング係数行列 Where the 1st to Mth elements are the mixing coefficient matrix of the first embodiment.

と同様に設定され、Ｍ＋１とＭ＋２番目の要素が０であるＭ＋２行１列の行列である。 Is an M + 2 × 1 matrix where the M + 1 and M + 2th elements are 0.

２つ目は、雑音を抑圧する条件である。マイクロホン受音信号のエコー成分Ｘ_E,1（ω）〜Ｘ_E,M（ω）がフィルタ部に入力された場合に各チャネルの出力が０となる式（２８）と式（２９）が、その条件となる。 The second is a condition for suppressing noise. When the echo components X _{E, 1} (ω) to X _{E, M} (ω) of the microphone sound reception signal are input to the filter unit, Expressions (28) and (29) in which the output of each channel becomes 0 are as follows: This is the condition.

３つ目の条件は、エコー成分を抑圧する条件である。マイクロホン受音信号のエコー成分Ｘ_E,1（ω）〜Ｘ_E,M（ω）と受話信号Ｚ_L（ω）、Ｚ_R（ω）がフィルタ部に入力された場合に各チャネルの出力が０となる式（３０）と式（３１）がその条件となる。 The third condition is a condition for suppressing the echo component. When the echo components X _{E, 1} (ω) to X _{E, M} (ω) of the microphone received signal and the received signals Z _L (ω) and Z _R (ω) are input to the filter unit, the output of each channel is Expressions (30) and (31) that become 0 are the conditions.

次に、式（２６）と式（２８）と式（３０）、式（２７）と式（２９）と式（３１）の条件をフィルタ係数行列 Next, the conditions of Expression (26), Expression (28), Expression (30), Expression (27), Expression (29), and Expression (31) are set as the filter coefficient matrix.

について最小二乗解で解けば、式（３２）と式（３３）となる。ただし、Ｃ_Siは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。Ｃ_Nは雑音抑圧に対する重みの定数であり、値が大きくなるほど雑音抑圧量が増加する。Ｃ_Eはエコー抑圧に対する重みの定数であり、値が大きくなるほどエコー抑圧量が増加する。 Is solved by least squares solution, equations (32) and (33) are obtained. However, C _Si is a constant of the weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value becomes larger. C _N is a constant of the weight for noise suppression, and the noise suppression amount increases as the value increases. _CE is a constant of weight for echo suppression, and the echo suppression amount increases as the value increases.

以上で、雑音とエコーを抑圧し、所望のレベル差を得るためのフィルタ係数を求める式（３２）と式（３３）を導出した。 Thus, the equations (32) and (33) for obtaining the filter coefficient for suppressing the noise and the echo and obtaining a desired level difference are derived.

次に、式（３２）と式（３３）により求められた、Ｌチャネルフィルタ係数 Next, the L channel filter coefficient obtained by Expression (32) and Expression (33)

とＲチャネルフィルタ係数 And R channel filter coefficients

は、Ｌチャネルフィルタ１０２_L1〜１０２_LM、３０１_LL、３０１_LRと、Ｒチャネルフィルタ１０２_R1〜１０２_RM、３０１_RL、３０１_RRにそれぞれコピーされ、マイクロホン受音信号と受話信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器１０３_Lと１０３_Rで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 _{L1 to} 102 _LM , 301 _LL and 301 _LR and the R channel filters 102 _{R1 to} 102 _RM , 301 _RL and 301 _RR , respectively, and filter the microphone sound reception signal and the reception signal, respectively. The filtered signals are added by adders 103 _L and 103 _R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のＬＲチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。また、第３の実施形態の収音装置に受話信号をフィルタリングするフィルタを追加したことにより、第３の実施形態の収音装置よりも高いエコー抑圧が実現する。 As described above, in this embodiment, the reception interval, the noise interval, and the transmission interval are detected from the reception signals of a plurality of microphones, the speaker position is detected in the transmission interval, a covariance matrix is obtained, and echo suppression is performed. The filter coefficient that gives the desired level difference between the LR channels for each speaker position and the noise suppression is obtained, and the microphone reception signal is filtered for each channel by these filter coefficients to suppress the echo and the noise. A stereo output signal having a desired level difference for each person position can be obtained. Further, by adding a filter for filtering the received signal to the sound collecting device of the third embodiment, higher echo suppression is realized than in the sound collecting device of the third embodiment.

［第５の実施形態］
図４は本発明の第５の実施形態の収音装置の要部のブロック図である。 [Fifth Embodiment]
FIG. 4 is a block diagram of a main part of a sound collecting apparatus according to the fifth embodiment of the present invention.

本実施形態の収音装置は、第１〜４の実施形態のいずれかの収音装置に話者音声レベル推定部１０８とゲイン算出部１０９を追加した構成である。話者音声レベル推定部１０８は、各話者の共分散行列より各話者の音声レベルを推定し、ゲイン算出部１０９は、各話者の音声レベルから、各話者の出力音声レベルが適正レベルとなるゲインを算出する。これにより、全ての話者の音声レベルを適正レベルとし、聞き取りやすい音量での収音を実現する。また、第１〜４の実施形態のいずれかの収音装置と同様に、エコー抑圧、雑音抑圧、チャネル間のレベル差をつけた良好な音像定位も同時に実現する。 The sound collection device of this embodiment has a configuration in which a speaker voice level estimation unit 108 and a gain calculation unit 109 are added to any of the sound collection devices of the first to fourth embodiments. The speaker voice level estimation unit 108 estimates the voice level of each speaker from the covariance matrix of each speaker, and the gain calculation unit 109 determines that the output voice level of each speaker is appropriate from the voice level of each speaker. Calculate the level gain. As a result, the sound level of all speakers is set to an appropriate level, and sound collection at a volume that is easy to hear is realized. In addition, as in any of the sound collection devices of the first to fourth embodiments, echo sound suppression, noise suppression, and good sound localization with a level difference between channels are realized at the same time.

まず、送受話検出部２０１、話者位置検出部１０５、共分散行列計算部１０４、共分散記憶部１０６については、第１から第４の実施形態のいずれかの収音装置と同様な処理となる。 First, the transmission / reception detection unit 201, the speaker position detection unit 105, the covariance matrix calculation unit 104, and the covariance storage unit 106 are processed in the same manner as the sound collection device of any of the first to fourth embodiments. Become.

話者音声レベル推定部１０８は、各話者の音声レベルＰ_Siを、共分散行列記憶部１０６に記憶されている各話者位置に対する共分散行列 The speaker voice level estimation unit 108 uses the voice level P _Si of each speaker as a covariance matrix for each speaker position stored in the covariance matrix storage unit 106.

と、話者位置ごとのシングルチャネルミキシング係数行列 And a single-channel mixing coefficient matrix for each speaker location

から式（３４）または式（３５）を用いて求められる。シングルチャネルミキシング係数行列 Is obtained using equation (34) or equation (35). Single channel mixing coefficient matrix

には、例えばミキシング係数行列 For example, a mixing coefficient matrix

を加算平均した行列、またはミキシング係数行列 Matrix that is the average of or mixing coefficient matrix

を加算平均した行列を用いる。 A matrix obtained by averaging is used.

次に、ゲイン算出部１０９は、各話者音声レベルＰ_Siを適正レベルＰ_opt（聞き取りやすいレベルで、あらかじめ設定される）にするためのゲインＡ_Siを算出する。ゲインＡ_Siは、式（３６）により求めることができる。 Next, the gain calculation unit 109 calculates a gain A _Si for setting each speaker voice level P _Si to an appropriate level P _opt (a level that is easy to hear and set in advance). Gain A _Si can be obtained by equation (36).

Ｌチャネルフィルタ係数算出部１０７_LとＲチャネルフィルタ係数算出部１０７_Rは、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とし、エコーと雑音を抑圧するためのフィルタ係数を計算する。エコーと雑音を抑圧する条件は、第１〜４の実施形態のいずれかの収音装置と同様である。各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とする条件は、式（３７）と式（３８）、または式（３９）と式（４０）で表される。 The L channel filter coefficient calculation unit 107 _L and the R channel filter coefficient calculation unit 107 _R pick up the sound emitted from each speaker at an appropriate level, and the sound emitted from each speaker is set to a desired level on the LR channel. The filter coefficient for suppressing the echo and the noise is calculated as the difference. The conditions for suppressing echo and noise are the same as those in any of the sound collection devices of the first to fourth embodiments. The conditions for picking up the sound emitted from each speaker at an appropriate level and setting the sound emitted from each speaker to a desired level difference in the LR channel are the expressions (37) and (38) or ( 39) and formula (40).

次に、第１〜４の実施形態のいずれかの収音装置のエコーと雑音を抑圧する条件と、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とする条件である式（３７）と式（３８）、または式（３９）と式（４０）を、フィルタ係数行列 Next, conditions for suppressing echo and noise of any of the sound collection devices of the first to fourth embodiments and sound emitted from each speaker were collected at an appropriate level and emitted from each speaker. Expression (37) and Expression (38), or Expression (39) and Expression (40), which are conditions for making the sound have a desired level difference in the LR channel, are expressed as a filter coefficient matrix.

または Or

について最小二乗解で解けば、式（４１）と式（４２）、または式（４３）と式（４４）、または式（４５）と式（４６）、または式（４７）と式（４８）となる。 Is solved by the least squares solution, Equation (41) and Equation (42), Equation (43) and Equation (44), Equation (45) and Equation (46), Equation (47) and Equation (48), It becomes.

ただし、式（４１）と式（４２）は各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とする条件で求めたフィルタ係数、式（４３）と式（４４）は各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とし、雑音を抑圧する条件で求めたフィルタ係数、式（４５）と式（４６）は、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とし、エコーと雑音を抑圧する条件で求めたフィルタ係数、式（４７）と式（４８）は、受話信号をフィルタリングするフィルタを持ち、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をＬＲチャネルで所望のレベル差とし、エコーと雑音を抑圧する条件で求めたフィルタ係数である。 However, Equation (41) and Equation (42) were obtained under the condition that the sound emitted from each speaker was picked up at an appropriate level and the sound emitted from each speaker was set to a desired level difference in the LR channel. Filter coefficients, Equation (43) and Equation (44), pick up the sound emitted from each speaker at an appropriate level, and make the sound emitted from each speaker a desired level difference in the LR channel to suppress noise. (45) and (46) are the filter coefficients obtained under the above conditions, the sound emitted from each speaker is picked up at an appropriate level, and the sound emitted from each speaker is obtained at a desired level on the LR channel. The filter coefficients obtained under the condition that the echo and noise are suppressed as the difference, Equations (47) and (48) have a filter for filtering the received signal, and the sound emitted from each speaker is collected at an appropriate level. The sound from each speaker to the desired level on the LR channel And to a filter coefficient calculated by the condition for suppressing the echo and noise.

次に、式（４１）と式（４２）、または式（４３）と式（４４）、または式（４５）と式（４６）、または式（４７）と式（４８）により求められた、Ｌチャネルフィルタ係数とＲチャネルフィルタ係数は、Ｌチャネルフィルタ１０２_L1〜１０２_LM、３０１_LL、３０１_LRと、Ｒチャネルフィルタ１０２_R1〜１０２_RM、３０１_RL、３０１_RRにそれぞれコピーされ、マイクロホン受音信号と受話信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器１０３_Lと１０３_Rで加算され、ステレオ出力信号として出力される。 Next, it was calculated | required by Formula (41) and Formula (42), Formula (43) and Formula (44), Formula (45) and Formula (46), or Formula (47) and Formula (48), L channel filter coefficients and the R-channel filter coefficients, and L the channel filter _{_{_{102 L1 ~102 LM, 301 LL,}}} 301 LR, respectively copied to the R channel filter _{_{_{102 R1 ~102 RM, 301 RL,}}} 301 RR, microphone received sound signal And the received signal are respectively filtered. The filtered signals are added by adders 103 _L and 103 _R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数のマイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のＬＲチャネル間のレベル差を与え、各話者から発せられた音を適正レベルで収音するフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、各話者から発せられた音を適正レベルで収音し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。 As described above, in the present embodiment, the reception interval, the noise interval, and the transmission interval are detected from the reception signals of the plurality of microphones, the speaker position is detected in the transmission interval, the covariance matrix is obtained, and the echo is detected. A level difference between desired LR channels is given for each suppression, noise suppression, and speaker position, and a filter coefficient for collecting sounds emitted from each speaker at an appropriate level is obtained. By filtering each channel, the echo and noise are suppressed, the sound emitted from each speaker is collected at an appropriate level, and a stereo output signal having a desired level difference for each speaker position is obtained. Can do.

［第６の実施形態］
図５は本発明の第６の実施形態の収音装置の要部のブロック図である。 [Sixth Embodiment]
FIG. 5 is a block diagram of a main part of a sound collecting apparatus according to the sixth embodiment of the present invention.

本実施形態の収音装置は、第１〜５の実施形態のいずれかの収音装置に白色化部１１０を追加した構成である。白色化部１１０は、各共分散行列をＬＲチャネルフィルタ係数計算部１０７_L，１０７_Rの前段階で白色化する。これにより、マイクロホンで受音された信号の周波数特性に関与しないフィルタ係数が求められ、安定した動作が可能となる。 The sound collection device of this embodiment has a configuration in which a whitening unit 110 is added to any one of the sound collection devices of the first to fifth embodiments. Whitening unit 110, whitening in the preceding stage of each covariance matrix LR channel filter coefficient calculating section 107 _L, 107 _R. As a result, a filter coefficient that is not related to the frequency characteristic of the signal received by the microphone is obtained, and stable operation is possible.

白色化は、共分散行列 Whitening is the covariance matrix

の対角成分のうち最もパワーの大きいＲ_kk（ω）を平均化する白色化ゲイン Whitening gain that averages R _kk (ω) with the largest power among the diagonal components of

を乗算するか、共分散行列の対角成分の平均パワーを平均化する白色化ゲイン Or a whitening gain that averages the mean power of the diagonal components of the covariance matrix

を乗算することで行う。これらは、それぞれ式（４９）または式（５０）と、式（５１）または式（５２）により表される。 This is done by multiplying These are represented by the formula (49) or the formula (50) and the formula (51) or the formula (52), respectively.

ただし、βは白色化の度合いを調整する係数であり、１となれば完全な白色化となり、０となれば白色化は行われなくなる。 However, β is a coefficient for adjusting the degree of whitening. When it is 1, it becomes complete whitening, and when it becomes 0, whitening is not performed.

第５の実施形態の収音装置では、共分散行列の白色化により、音源の周波数特性に依存しないフィルタを求めることができる。これにより、音源の周波数特性が変化しても、フィルタ係数の変化がなく、本発明の処理による音色の変化を防ぐことができる。 In the sound collection device of the fifth embodiment, a filter independent of the frequency characteristics of the sound source can be obtained by whitening the covariance matrix. Thereby, even if the frequency characteristic of the sound source changes, there is no change in the filter coefficient, and the change in timbre due to the processing of the present invention can be prevented.

これら以外の部分に関しては、第１〜５の実施形態のいずれかの収音装置と同じであるので、説明を省略する。 Since other parts are the same as those of any of the sound collecting apparatuses of the first to fifth embodiments, the description thereof is omitted.

［第７〜９の実施形態］
図６〜図８はそれぞれ本発明の第７、第８、第９の実施形態の収音装置のブロック図である。 [Seventh to ninth embodiments]
6 to 8 are block diagrams of sound collecting apparatuses according to seventh, eighth and ninth embodiments of the present invention, respectively.

本実施形態の収音装置は、第１〜６の実施形態のいずれかの収音装置にＦＦＴ部４０１₁〜４０１_M、５０１_L、５０１_RとＩＦＦＴ部４０２_L、４０２_Rを追加した構成である。ＦＦＴ部４０１₁〜４０１_Mは、各マイクロホン受音信号をそれぞれＦＦＴして、時間領域信号から周波数領域信号に変換する。ＦＦＴ部５０１_L、５０１_Rは、ＬＲチャネル受話信号をそれぞれＦＦＴして、時間領域信号から周波数領域信号に変換する。ＩＦＦＴ部４０２_L、４０２_Rは、ＬＲチャネル加算器１０３_L、１０３_Rの出力信号をそれぞれＩＦＦＴして周波数領域信号から時間領域信号に変換する。また、第１〜第６の実施形態の各処理部は全て周波数領域での計算とし、時間領域の演算に比べ低演算量を実現する。 The sound collection device of the present embodiment has a configuration in which FFT units 401 _{1 to} 401 _M , 501 _L , and 501 _R and IFFT units 402 _L and 402 _R are added to the sound collection device of any of the _{first to} sixth embodiments. is there. The FFT units 401 _{1 to} 401 _M perform FFT on each microphone sound reception signal and convert the time domain signal into a frequency domain signal. The FFT units 501 _L and 501 _R each perform FFT on the LR channel reception signal and convert the time domain signal into the frequency domain signal. IFFT sections 402 _L and 402 _R perform IFFT on the output signals of LR channel adders 103 _L and 103 _R , respectively, and convert the frequency domain signals into time domain signals. Further, all the processing units of the first to sixth embodiments are calculated in the frequency domain, and realize a low calculation amount as compared with the calculation in the time domain.

これら以外の部分に関しては、第１〜６の実施形態のいずれかの収音装置と同じであるので、説明を省略する。 Since other parts are the same as those of any of the sound collecting apparatuses of the first to sixth embodiments, the description thereof is omitted.

［第１０の実施形態］
本発明の第１０の実施形態について説明する。 [Tenth embodiment]
A tenth embodiment of the present invention will be described.

本実施形態の収音装置は、第１〜９の実施形態のいずれかの収音装置の出力チャネル数を３チャネル以上のＪチャネルとしたものである。ＬとＲチャネルフィルタ係数計算部１０７_L、１０７_RとＬとＲチャネルフィルタ１０２_L1〜１０２_LM、３０１_LL、３０１_LR、１０２_R1〜１０２_RM、３０１_RL、３０１_RRとＬとＲチャネル加算器１０３_L、１０３_Rを、３チャネル以上の１〜Ｊチャネルフィルタ係数計算手段と、１〜Ｊチャネルフィルタ手段と、１〜Ｊチャネル加算手段に置き換えた構成である。 The sound collection device of the present embodiment is one in which the number of output channels of any of the sound collection devices of the first to ninth embodiments is a J channel of 3 channels or more. L and R channel filter coefficient calculation units 107 _L and 107 _R and L and R channel filters 102 _{L1 to} 102 _LM , 301 _LL , 301 _LR , 102 _{R1 to} 102 _RM , 301 _RL , 301 _RR and L and R channel adders 103 _L, and 103 _R, and more 1~J channel filter coefficient calculation means 3 channels, and 1~J channel filter unit, a configuration obtained by replacing the 1~J channel adding means.

出力チャネルを増加させたことにより、３チャネル以上の多チャンネルでの収音が可能となる。 By increasing the number of output channels, it is possible to collect sound on multiple channels of three or more channels.

これら以外の部分に関しては、第１〜９の実施形態のいずれかの収音装置と同じであるので、説明を省略する。 Since other parts are the same as those of any of the sound collecting apparatuses of the first to ninth embodiments, the description thereof is omitted.

なお、本発明は専用のハードウェアにより実現されるもの以外に、その機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。コンピュータ読み取り可能な記録媒体とは、フロッピーディスク、光磁気ディスク、ＣＤ−ＲＯＭ等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置等の記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの（伝送媒体もしくは伝送波）、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含む。 In addition to what is implemented by dedicated hardware, the present invention records a program for realizing the function on a computer-readable recording medium, and the program recorded on the recording medium is stored in a computer system. It may be read and executed. The computer-readable recording medium refers to a recording medium such as a floppy disk, a magneto-optical disk, a CD-ROM, or a storage device such as a hard disk device built in the computer system. Furthermore, a computer-readable recording medium is a server that dynamically holds a program (transmission medium or transmission wave) for a short time, such as when transmitting a program via the Internet, and a server in that case. Some of them hold programs for a certain period of time, such as volatile memory inside computer systems.

本発明の第１および第２の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection apparatus of the 1st and 2nd embodiment of this invention. 本発明の第３の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 3rd Embodiment of this invention. 本発明の第４の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 4th Embodiment of this invention. 本発明の第５の実施形態の収音装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the sound collection device of the 5th Embodiment of this invention. 本発明の第６の実施形態の収音装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the sound collection device of the 6th Embodiment of this invention. 本発明の第７の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 7th Embodiment of this invention. 本発明の第８の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 8th Embodiment of this invention. 本発明の第９の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 9th Embodiment of this invention. 本発明により形成される指向性を示す図である。It is a figure which shows the directivity formed by this invention. 従来の収音方法を説明する図である。It is a figure explaining the conventional sound collection method.

Explanation of symbols

１０１₁〜１０１_M マイクロホン
１０２_L1〜１０２_LM、３０１_LL、３０１_LR Ｌチャネルフィルタ
１０２_R1〜１０２_RM、３０１_RL、３０１_RR Ｒチャネルフィルタ
１０３_L Ｌチャネル加算器
１０３_R Ｒチャネル加算器
１０４共分散行列計算部
１０５話者位置検出部
１０６共分散行列記憶部
１０７_L Ｌチャネルフィルタ係数計算部
１０７_R Ｒチャネルフィルタ係数計算部
１０８話者音声レベル推定部
１０９ゲイン計算部
１１０白色化部
２０１送受話検出部
２０２_L Ｌチャネルスピーカ
２０２_R Ｒチャネルスピーカ
２０３_L Ｌチャネルミキシング係数設定部
２０３_R Ｒチャネルミキシング係数設定部
４０１₁〜４０１_M、５０１_L、５０１_R ＦＦＴ
４０２_L、４０２_R ＩＦＦＴ
９０１_L、９０１_R 従来技術の指向性マイクロホン
９０２_L、９０２_R 本発明により形成される指向特性
９０３本発明の処理 101 _{1 to} 101 _M microphones 102 _{L1 to} 102 _LM , 301 _LL , 301 _LR L channel filters 102 _{R1 to} 102 _RM , 301 _RL , 301 _RR R channel filters 103 _L L channel adders 103 _R R channel adders 104 covariance matrix Calculation unit 105 Speaker position detection unit 106 Covariance matrix storage unit 107 _L L channel filter coefficient calculation unit 107 _R R channel filter coefficient calculation unit 108 Speaker voice level estimation unit 109 Gain calculation unit 110 Whitening unit 201 Transmission / reception detection unit 202 _L L channel speaker 202 _R R channel speaker 203 _L L channel mixing coefficient setting unit 203 _R R channel mixing coefficient setting unit 401 _{1 to} 401 _M , 501 _L , 501 _R FFT
402 _L , 402 _R IFFT
901 _L , 901 _R Conventional directional microphones 902 _L , 902 _R Directional characteristics formed by the present invention 903 Processing of the present invention

Claims

A sound collection method,
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the L channel mixing coefficient under the condition that each speaker voice component received by each of the plurality of sound pickup means is mixed by the L channel mixing coefficient corresponding to each speaker position. Calculating the L channel filter coefficient from the L channel filter coefficient calculating stage;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the R channel mixing coefficient under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed by the R channel mixing coefficient corresponding to each speaker position. Calculating an R channel filter coefficient from the R channel filter coefficient calculating stage;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.

A sound collection method,
A speaker position detection stage for detecting a speaker position and a noise section from a received signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each noise interval and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position and the noise component is suppressed. And an L channel filter coefficient calculation step of calculating an L channel filter coefficient from the L channel mixing coefficient,
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, and the noise component is suppressed. And an R channel filter coefficient calculating step of calculating an R channel filter coefficient from the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.

A sound collection method,
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.

A sound collection method,
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculating step of calculating a covariance matrix from the received sound signal received by each of the plurality of sound collecting means, the L channel received signal, and the R channel received signal;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting step for setting in advance an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the L channel filter coefficient;
An R channel filter step of filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.

A speaker voice level estimating step for estimating a voice level of each speaker from the stored covariance matrix of each speaker;
A gain calculating unit for calculating a gain for each speaker for outputting each speaker's voice at an appropriate level from the voice level of each speaker;
In the L channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed. From the stored covariance matrix and the L channel mixing coefficient, L channel filter coefficient is calculated,
In the R channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed, from the stored covariance matrix and the R channel mixing coefficient. Calculating R channel filter coefficients;
The sound collection method according to claim 1.

The stored covariance is a gain for smoothing the frequency characteristic of the diagonal component of the stored covariance matrix having the highest power or the sum of the diagonal components of the stored covariance matrix. 6. The whitening step according to claim 1, further comprising a whitening step of multiplying a matrix and inputting a whitened covariance matrix to the L channel filter coefficient calculation step and the R channel filter coefficient calculation step. Sound collection method.

An FFT stage for converting a signal received by each of the plurality of sound pickup means and the received signal from a time domain signal to a frequency domain signal;
An IFFT stage for converting the output signal of the L channel addition stage and the R channel addition stage from a frequency domain signal to a time domain signal;
Each of the above steps is performed in the frequency domain,
The sound collection method according to claim 1.

The L and R channel filter coefficient calculation stage, the L and R channel filter stage, and the L and R channel addition stage are divided into three or more channels of 1 to J channel filter coefficient calculation stage, 1 to J channel filter stage, and 1 to J. Replaced with the channel addition stage,
The sound collection method according to claim 1.

A sound collecting device,
Speaker position detecting means for detecting a speaker position from a received sound signal received by each of a plurality of sound collecting means;
A covariance matrix calculating means for calculating a covariance matrix from the received sound signals received by each of the plurality of sound collecting means;
Covariance matrix storage means for storing the covariance matrix for each speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the L channel mixing coefficient under the condition that each speaker voice component received by each of the plurality of sound pickup means is mixed by the L channel mixing coefficient corresponding to each speaker position. L channel filter coefficient calculating means for calculating L channel filter coefficients from:
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the R channel mixing coefficient under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed by the R channel mixing coefficient corresponding to each speaker position. R channel filter coefficient calculating means for calculating R channel filter coefficients from
L channel filter means for filtering received sound signals received by each of the plurality of sound collection means, respectively, with the L channel filter coefficients;
R channel filter means for filtering the received sound signal received by each of the L channel plural sound collecting means with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.

A sound collecting device,
Speaker position detecting means for detecting a speaker position and a noise section from a received sound signal received by each of a plurality of sound collecting means;
A covariance matrix calculating means for calculating a covariance matrix from the received sound signals received by each of the plurality of sound collecting means;
Covariance matrix storage means for storing the covariance matrix for each noise interval and speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position and the noise component is suppressed. And L channel filter coefficient calculation means for calculating an L channel filter coefficient from the L channel mixing coefficient,
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, and the noise component is suppressed. And R channel filter coefficient calculating means for calculating an R channel filter coefficient from the R channel mixing coefficient,
L channel filter means for filtering received sound signals received by each of the plurality of sound collection means, respectively, with the L channel filter coefficients;
R channel filter means for filtering the received sound signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.

A sound collecting device,
A transmission / reception detecting means for detecting a transmission section, a reception section, and a noise section from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
Speaker position detecting means for detecting a speaker position from a received sound signal received by each of a plurality of sound collecting means;
A covariance matrix calculating means for calculating a covariance matrix from the received sound signals received by each of the plurality of sound collecting means;
Covariance matrix storage means for storing the covariance matrix for each reception interval, noise interval, and speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. L channel filter coefficient calculation means for calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. R channel filter coefficient calculating means for calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
L channel filter means for filtering received sound signals received by each of the plurality of sound collection means, respectively, with the L channel filter coefficients;
R channel filter means for filtering the received sound signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.

A sound collecting device,
A transmission / reception detecting means for detecting a transmission interval, a reception interval, and a noise interval from a received sound signal received by each of a plurality of sound collection means, and an L channel reception signal and an R channel reception signal from a communication partner;
Speaker position detecting means for detecting a speaker position from a received sound signal received by each of a plurality of sound collecting means;
Covariance matrix calculating means for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collecting means, the L channel received signal, and the R channel received signal;
Covariance matrix storage means for storing the covariance matrix for each reception interval, noise interval, and speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. L channel filter coefficient calculating means for calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. R channel filter coefficient calculating means for calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
L channel filter means for filtering the received sound signal, the L channel received signal, and the R channel received signal received by each of the plurality of sound collecting means, respectively, with the L channel filter coefficient;
R channel filter means for filtering the received sound signal, the L channel received signal, and the R channel received signal received by each of the plurality of sound collecting means, respectively, with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.

Speaker voice level estimating means for estimating the voice level of each speaker from the stored covariance matrix of each speaker;
A gain calculating unit for calculating a gain for each speaker for outputting each speaker's voice at an appropriate level from the voice level of each speaker;
The L channel filter coefficient calculation means further multiplies the gain for each speaker, suppresses the received signal component, and suppresses the noise component, from the stored covariance matrix and the L channel mixing coefficient. L channel filter coefficient is calculated,
The R channel filter coefficient calculation means further multiplies the gain for each speaker, suppresses the received signal component, and suppresses the noise component, from the stored covariance matrix and the R channel mixing coefficient. Calculating R channel filter coefficients;
The sound collection device according to claim 9.

The stored covariance is a gain for smoothing the frequency characteristic of the diagonal component of the stored covariance matrix having the highest power or the added value of the diagonal components of the stored covariance matrix. A whitening unit that multiplies the matrix and inputs the whitened covariance matrix to the L channel filter coefficient calculation unit and the R channel filter coefficient calculation unit;
The sound collecting device according to claim 9.

FFT means for converting a signal received by each of the plurality of sound pickup means and the received signal from a time domain signal to a frequency domain signal;
IFFT means for converting the output signal of the L channel addition means and the R channel addition means from a frequency domain signal to a time domain signal;
Each means operates in the frequency domain,
The sound collection device according to claim 9.

The L and R channel filter coefficient calculation means, the L and R channel filter means, and the L and R channel addition means are divided into three or more channels, 1 to J channel filter coefficient calculation means, 1 to J channel filter means, and 1 to J. Replaced with channel addition means,
The sound collecting device according to claim 9.

A sound collection program for causing a computer to execute the sound collection method according to claim 1.

A recording medium on which the sound collecting program according to claim 17 is described.