JP4298466B2 - Sound collection method, apparatus, program, and recording medium - Google Patents

Sound collection method, apparatus, program, and recording medium Download PDF

Info

Publication number
JP4298466B2
JP4298466B2 JP2003370697A JP2003370697A JP4298466B2 JP 4298466 B2 JP4298466 B2 JP 4298466B2 JP 2003370697 A JP2003370697 A JP 2003370697A JP 2003370697 A JP2003370697 A JP 2003370697A JP 4298466 B2 JP4298466 B2 JP 4298466B2
Authority
JP
Japan
Prior art keywords
channel
received
sound
speaker
covariance matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2003370697A
Other languages
Japanese (ja)
Other versions
JP2005136709A (en
Inventor
和則 小林
陽一 羽田
澄宇 阪内
末廣 島内
賢一 古家
暁 江村
章俊 片岡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2003370697A priority Critical patent/JP4298466B2/en
Publication of JP2005136709A publication Critical patent/JP2005136709A/en
Application granted granted Critical
Publication of JP4298466B2 publication Critical patent/JP4298466B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Stereophonic Arrangements (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To realize sound collection with localization feeling of a sound image even when there is a speaker at a location where an opening angle viewed from a microphone is small. <P>SOLUTION: A covariance matrix computing part 104 calculates a covariance matrix from a sound reception signal, and stores it in a covariance matrix storing part 106 for each location of a speaker detected by a speaker location detecting part 105. Then, L(R) channel filter coefficient calculating parts 107<SB>L</SB>and 107<SB>R</SB>calculate L(R) channel filter coefficients from the stored covariance matrix and L(R) channel mixing coefficients under such conditions that their received speaker voice components are mixed by the L(R) channel mixing coefficients corresponding to the respective speaker locations, and set those L(R) channel filter coefficients to L(R) channel filters 102<SB>L1</SB>to 102<SB>LM</SB>and 102<SB>R1</SB>to 102<SB>RM</SB>. The outputs of the L channel filters 102<SB>L1</SB>to 102<SB>LM</SB>are added by an L channel adder 103<SB>L</SB>, and the outputs of the R channel filters 102<SB>R1</SB>to 102<SB>RM</SB>are added by an R channel adder 103<SB>R</SB>to obtain an L channel output signal and an R channel output signal. <P>COPYRIGHT: (C)2005,JPO&amp;NCIPI

Description

本発明は、TV会議や音声会議、電話、遠隔講義などの収音方法および装置に関する。   The present invention relates to a sound collection method and apparatus for TV conferences, audio conferences, telephone calls, remote lectures, and the like.

図10は従来技術の収音装置の構成図である。従来技術の収音装置は指向性マイクロホン901Lと901Rで構成され、その指向性の主軸は120°程度の開き角で配置されている。 FIG. 10 is a block diagram of a conventional sound collecting device. The sound pickup device of the prior art is composed of directional microphones 901 L and 901 R , and the main axis of the directivity is arranged with an opening angle of about 120 °.

2個の指向性マイクロホンを異なる方向を向けて配置することにより、話者の位置によってLチャネルとRチャネルに収音される音声レベルに差が生じる。これらの出力信号を2つのスピーカから再生することにより、音像の定位感のある再生を行うことができる。   By arranging the two directional microphones in different directions, a difference occurs in the sound level picked up by the L channel and the R channel depending on the position of the speaker. By reproducing these output signals from two speakers, it is possible to reproduce the sound image with a sense of localization.

例えば、図10の話者CはLチャネルマイクロホン901Lの主軸方向にいるので、収音された話者Cの音声レベルはLチャネルのほうが大きく、再生したときにLチャネル側のスピーカに音像が定位する。また、LチャネルとRチャネルのマイクロホン901L,901Rの中間にいる話者Aの音声は、両マイクロホンにほぼ同じレベルで収音されるので、LチャネルとRチャネルのスピーカの中間に音像が定位する。 For example, since the speaker C in FIG. 10 is in the main axis direction of the L channel microphone 901 L , the voice level of the collected speaker C is higher in the L channel, and a sound image is displayed on the speaker on the L channel side during playback. I'll pan. In addition, since the voice of the speaker A in the middle between the L-channel and R-channel microphones 901 L and 901 R is picked up by both microphones at almost the same level, a sound image is present between the L-channel and R-channel speakers. I'll pan.

このように、従来技術では音像の定位感のあるステレオ収音を行うことができる。
中島平太郎ら著、応用電気音響、コロナ社出版、日本音響学会編、pp.262−268、昭和54年
In this way, the conventional technique can perform stereo sound collection with a sense of localization of the sound image.
Heitaro Nakajima et al., Applied Electroacoustics, Corona Publishing, edited by the Acoustical Society of Japan, pp. 262-268, 1979

上述した従来技術の収音方法では、以下に示す問題がある。   The above-described conventional sound collection method has the following problems.

音声の距離減衰の影響により、マイクロホンから距離が離れている話者の音声レベルが小さく聞き取りづらい。もし、マイクの感度を上昇させ距離の離れている話者に対して適正なレベルとしたとしても、マイクロホンに近い話者の音声が過大なレベルとなる。   Due to the effect of voice attenuation, the voice level of speakers far away from the microphone is small and difficult to hear. Even if the sensitivity of the microphone is increased to an appropriate level for a speaker who is far away, the voice of the speaker close to the microphone becomes an excessive level.

図10の話者Aと話者Bのようにマイクロホンから見た開き角が小さい場合、話者Aと話者Bのマイクロホン間のレベル差はほぼ同じとなり、音像の定位感が得られなくなる。   When the opening angle viewed from the microphone is small as in the case of speaker A and speaker B in FIG. 10, the level difference between the microphones of speaker A and speaker B is almost the same, and the sense of localization of the sound image cannot be obtained.

雑音や、スピーカから再生される受話信号がマイクロホンに収音され、聞き取りづらい音声となる。   Noise and the received signal reproduced from the speaker are collected by the microphone, and the sound is difficult to hear.

複数人でテーブルを囲むTV会議や音声会議に従来技術を適用した場合、上記の問題を生じ、高品質な音声で音像の定位感のある通信を行うことが難しい。   When the conventional technology is applied to a TV conference or an audio conference in which a table is surrounded by a plurality of people, the above-described problem occurs, and it is difficult to perform communication with a sense of localization of sound images with high-quality audio.

本発明の目的は、マイクロホンから距離が離れている話者の音声を適正レベルにすることで聞き取りやすい音量での通話を実現し、マイクロホンから見た開き角が小さい位置に話者がいる場合でもLRチャネル間のレベル差を所望のレベル差とすることで音像の定位感のある収音を実現し、雑音と受話信号を抑圧した高品質な送話音声を得る収音方法、装置、およびプログラムを提供することである。   The object of the present invention is to realize a call with a volume that is easy to hear by setting the voice of a speaker far away from the microphone to an appropriate level, even when the speaker is at a position where the opening angle seen from the microphone is small. Sound collection method, apparatus, and program for realizing high-quality transmitted sound with suppressed noise and received signal by realizing sound collection with a sense of localization of sound image by setting a level difference between LR channels as a desired level difference Is to provide.

本発明の第1の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階を有する。
According to the first aspect of the present invention, the sound collection method comprises:
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the L channel mixing coefficient under the condition that each speaker voice component received by each of the plurality of sound pickup means is mixed by the L channel mixing coefficient corresponding to each speaker position. Calculating the L channel filter coefficient from the L channel filter coefficient calculating stage;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the R channel mixing coefficient under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed by the R channel mixing coefficient corresponding to each speaker position. Calculating an R channel filter coefficient from the R channel filter coefficient calculating stage;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

複数マイクロホンからの受音信号から話者位置を検出し、共分散行列を求め、話者位置ごとに所望のLRチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。   Speaker position is detected from sound signals received from a plurality of microphones, a covariance matrix is obtained, a filter coefficient that gives a desired level difference between LR channels is obtained for each speaker position, and the microphone sound reception signal is obtained using these filter coefficients. Is filtered for each channel, a stereo output signal having a desired level difference for each speaker position can be obtained.

本発明の第2の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号から話者位置と雑音区間を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階を有する。
According to the second aspect of the present invention, the sound collection method comprises:
A speaker position detection stage for detecting a speaker position and a noise section from a received signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each noise interval and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position and the noise component is suppressed. And an L channel filter coefficient calculation step of calculating an L channel filter coefficient from the L channel mixing coefficient,
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, and the noise component is suppressed. And an R channel filter coefficient calculating step of calculating an R channel filter coefficient from the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

話者位置および雑音区間を推定し、各話者位置に対する共分散行列と雑音に対する共分散行列を保存しておき、これらを用いてLチャネルとRチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Lチャネル出力信号とRチャネル出力信号を得る。これにより、雑音の抑圧と、各話者からの音声信号がLチャネルとRチャネルでレベル差を持った良好な音像の定位が実現する。   Estimate the speaker position and noise interval, store the covariance matrix for each speaker position and the covariance matrix for noise, and use these to determine the L channel and R channel filter coefficients, with these filter coefficients: Each of the microphone sound reception signals is filtered and added to obtain an L channel output signal and an R channel output signal. This realizes noise suppression and good sound image localization in which the audio signal from each speaker has a level difference between the L channel and the R channel.

本発明の第3の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号と、通信相手からのLチャネル受話信号とRチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出段階と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階を有する。
According to the third aspect of the present invention, the sound collection method comprises:
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

受話信号とマイクロホン受音信号から受話区間、送話区間、雑音区間を検出し、送話区間であった場合に話者位置を推定し、各話者位置に対する共分散行列と雑音の共分散行列とエコーの共分散行列を保存しておき、これらを用いてLチャネルとRチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Lチャネル出力信号とRチャネル出力信号を得る。これにより、雑音とエコーが抑圧され、各話者からの音声信号がLチャネルとRチャネルでレベル差を持ち、高品質な音声での通話と良好な音像の定位が実現する。   Detecting the reception interval, transmission interval, and noise interval from the reception signal and microphone reception signal, estimating the speaker position if it is the transmission interval, and covariance matrix and noise covariance matrix for each speaker position And the echo covariance matrix are stored, and the L channel and R channel filter coefficients are obtained using them, and the microphone sound reception signal is filtered and added by these filter coefficients, respectively, and the L channel output signal and the R channel are added. Get the output signal. As a result, noise and echo are suppressed, and the voice signal from each speaker has a level difference between the L channel and the R channel, thereby realizing a high quality voice call and good sound image localization.

本発明の第4の態様によれば、収音方法は、
複数の収音手段の各々で受音された受音信号と、通信相手からのLチャネル受話信号とRチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出段階と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階を有する。
According to the fourth aspect of the present invention, the sound collection method comprises:
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculating step of calculating a covariance matrix from the received sound signal received by each of the plurality of sound collecting means, the L channel received signal, and the R channel received signal;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the L channel filter coefficient;
An R channel filter step of filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
An R channel addition stage for adding the output signals of the R channel filter stage;

複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のLRチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。第3の態様の収音方法に受話信号をフィルタリングすることを追加したことにより、第3の態様の収音方法よりも高いエコー抑圧が実現する。   Detects the reception interval, noise interval, and transmission interval from the received signals of multiple microphones, detects the speaker position in the transmission interval, obtains a covariance matrix, and obtains desired echo suppression, noise suppression, and speaker position for each speaker position. Filter coefficients that give the level difference between LR channels are obtained, and the microphone sound reception signal is filtered for each channel using those filter coefficients, so that echo and noise are suppressed, and a desired level difference is provided for each speaker position. A stereo output signal can be obtained. By adding filtering of the received signal to the sound collection method of the third aspect, echo suppression higher than that of the sound collection method of the third aspect is realized.

本発明の第1の実施態様によれば、収音方法は、
前記記憶された各話者の共分散行列から各話者の音声レベルを推定する話者音声レベル推定段階と、
前記各話者の音声レベルから、各話者音声が適正レベルで出力されるための各話者に対するゲインを各々算出するゲイン算出部とをさらに有し、
前記Lチャネルフィルタ係数計算段階は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出し、
前記Rチャネルフィルタ係数計算段階は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出する。
According to the first embodiment of the present invention, the sound collection method comprises:
A speaker voice level estimating step for estimating a voice level of each speaker from the stored covariance matrix of each speaker;
A gain calculating unit for calculating a gain for each speaker for outputting each speaker's voice at an appropriate level from the voice level of each speaker;
In the L channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed. From the stored covariance matrix and the L channel mixing coefficient, L channel filter coefficient is calculated,
In the R channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed, from the stored covariance matrix and the R channel mixing coefficient. R channel filter coefficients are calculated.

複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のLRチャネル間のレベル差を与え、各話者から発せられた音を適正レベルで収音するフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、各話者から発せられた音を適正レベルで収音し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。   Detects the reception interval, noise interval, and transmission interval from the received signals of multiple microphones, detects the speaker position in the transmission interval, obtains a covariance matrix, and obtains desired echo suppression, noise suppression, and speaker position for each speaker position. By providing a level difference between LR channels, obtaining filter coefficients for picking up sound emitted from each speaker at an appropriate level, and filtering the microphone sound reception signal for each channel using those filter coefficients, echo and noise are obtained. , The sound emitted from each speaker is picked up at an appropriate level, and a stereo output signal having a desired level difference for each speaker position can be obtained.

本発明の第2の実施態様によれば、収音方法は、
前記記憶された共分散行列のうち対角成分で最もパワーの大きい成分、または前記記憶された共分散行列の対角成分の加算値の周波数特性を平滑化するゲインを、前記記憶された共分散行列に乗算し、白色化された共分散行列を、前記Lチャネルフィルタ係数計算段階と前記Rチャネルフィルタ係数計算段階に入力する白色化段階をさらに有する。
According to the second embodiment of the present invention, the sound collection method comprises:
The stored covariance is a gain for smoothing the frequency characteristic of the diagonal component of the stored covariance matrix having the highest power or the added value of the diagonal components of the stored covariance matrix. The method further comprises a whitening step of inputting a whitened covariance matrix multiplied to the matrix to the L channel filter coefficient calculation step and the R channel filter coefficient calculation step.

共分散行列の白色化により、音源の周波数特性に依存しないフィルタを求めることができる。これにより、音源の周波数特性が変化しても、フィルタ係数の変化がなく、本発明の処理による音色の変化を防ぐことができる。   By whitening the covariance matrix, a filter that does not depend on the frequency characteristics of the sound source can be obtained. Thereby, even if the frequency characteristic of the sound source changes, there is no change in the filter coefficient, and the change in timbre due to the processing of the present invention can be prevented.

本発明の第3の実施態様によれば、収音方法は、
前記複数の収音手段の各々で受音された信号および前記受話信号の時間領域信号から周波数領域信号に変換するFFT段階と、
前記Lチャネル加算段階と前記Rチャネル加算段階の出力信号を周波数領域信号から時間領域信号に変換するIFFT段階をさらに有し、
前記各段階は周波数領域で演算する。
According to the third embodiment of the present invention, the sound collection method comprises:
An FFT stage for converting a signal received by each of the plurality of sound pickup means and a time domain signal of the received signal into a frequency domain signal;
An IFFT stage for converting the output signal of the L channel addition stage and the R channel addition stage from a frequency domain signal to a time domain signal;
Each step is performed in the frequency domain.

これにより、時間領域の演算に比べ低演算量を実現できる。   Thereby, a low calculation amount can be realized as compared with the calculation in the time domain.

本発明の第4の実施態様によれば、収音方法は、
前記LおよびRチャネルフィルタ係数計算段階と前記LおよびRチャネルフィルタ段階と前記LおよびRチャネル加算段階を、3チャネル以上の1〜Jチャネルフィルタ係数計算段階と1〜Jチャネルフィルタ段階と1〜Jチャネル加算段階に置き換えている。
According to the fourth embodiment of the present invention, the sound collection method comprises:
The L and R channel filter coefficient calculation stage, the L and R channel filter stage, and the L and R channel addition stage are divided into three or more channels of 1 to J channel filter coefficient calculation stage, 1 to J channel filter stage, and 1 to J. Replaced with the channel addition stage.

複数マイクロホンで収音した信号および受話信号から、以下の条件を満たす指向性を形成するLチャネルフィルタ係数とRチャネルフィルタ係数を求める。(条件1)マイクロホンから距離が離れている話者の音声を適切レベル(聞き取りやすいレベル)にする。(条件2)マイクロホンから見た開き角が小さい位置に話者がいる場合でもLRチャネルのレベル差を所望のレベル差(音像の定位感のあるレベル差)とする。(条件3)雑音と受話信号を抑圧する。次に、求められたLチャネルフィルタ係数とRチャネルフィルタ係数で複数マイクロホンで収音した信号および受話信号をフィルタリングし、それらの出力をLチャネル、Rチャネルごとに加算する。   An L channel filter coefficient and an R channel filter coefficient that form directivity that satisfies the following conditions are obtained from signals and received signals collected by a plurality of microphones. (Condition 1) The voice of a speaker who is far away from the microphone is set to an appropriate level (a level that is easy to hear). (Condition 2) Even when a speaker is at a position where the opening angle viewed from the microphone is small, the level difference of the LR channel is set to a desired level difference (a level difference with a sense of localization of a sound image). (Condition 3) Noise and received signal are suppressed. Next, the signal collected by the plurality of microphones and the received signal are filtered with the obtained L channel filter coefficient and R channel filter coefficient, and their outputs are added for each L channel and R channel.

これにより、マイクロホンから距離が離れている話者とマイクロホンに近い話者の音声レベルが適切となり、聞き取りやすい音量での通話が実現する。また、マイクロホンから見た開き角が小さい位置に話者がいる場合でもLRチャネル間で所望のレベル差となり、音像の定位感のある収音を実現する。さらに、雑音と受話信号を抑圧した高品質な音声での通信が実現する。   As a result, the voice levels of the speaker far from the microphone and the speaker close to the microphone are appropriate, and a call with a volume that is easy to hear is realized. Further, even when a speaker is located at a position where the opening angle viewed from the microphone is small, a desired level difference is obtained between the LR channels, and sound collection with a sense of localization of the sound image is realized. Furthermore, high-quality voice communication with suppressed noise and received signal is realized.

[第1の実施形態]
図1は本発明の第1の実施形態の収音装置のブロック図である。
[First Embodiment]
FIG. 1 is a block diagram of a sound collecting apparatus according to a first embodiment of the present invention.

本実施形態の収音装置は、マイクロホン1011〜101Mと、Lチャネルフィルタ102L1〜102LMと、Rチャネルフィルタ102R1〜102RMと、Lチャネル加算器103Lと、Rチャネル加算器103Rと、話者位置検出部105と、共分散行列計算部104と、共分散行列記憶部106と、Lチャネルフィルタ係数計算部107Lと、Rチャネルフィルタ係数計算部107Rと、Lチャネルミキシング係数設定部203Lと、Rチャネルミキシング係数設定部203Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 1 to 101 M , L channel filters 102 L1 to 102 LM , R channel filters 102 R1 to 102 RM , L channel adder 103 L , and R channel adder 103. R , speaker position detection unit 105, covariance matrix calculation unit 104, covariance matrix storage unit 106, L channel filter coefficient calculation unit 107 L , R channel filter coefficient calculation unit 107 R , and L channel mixing A coefficient setting unit 203 L and an R channel mixing coefficient setting unit 203 R are included.

本実施形態は、話者位置を推定し、各話者位置に対する共分散行列を保存しておき、これらを用いてLチャネルとRチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Lチャネル出力信号とRチャネル出力信号を得る。これにより、各話者からの音声信号がLチャネルとRチャネルで所望のレベル差を持ち、良好な音像の定位が実現する。   In this embodiment, a speaker position is estimated, a covariance matrix for each speaker position is stored, L-channel and R-channel filter coefficients are obtained using these, and microphone reception is performed using these filter coefficients. The signals are filtered and summed to obtain an L channel output signal and an R channel output signal. As a result, the sound signal from each speaker has a desired level difference between the L channel and the R channel, and good sound image localization is realized.

まず、話者位置検出部105は、マイクロホン1011〜101Mで受音したマイクロホン受音信号から、話者の位置を検出する。音源位置の推定方法は、例えば相互相関法による方法がある。 First, the speaker position detection unit 105 detects the position of the speaker from the microphone reception signal received by the microphones 101 1 to 101 M. As a sound source position estimation method, for example, there is a method based on a cross-correlation method.

M個のマイクロホンがあると想定し、i番目マイクロホン100iとj番目マイクロホン100jで受音された信号より求められる受音信号間遅延時間差を Assuming that there are M microphones, the delay time difference between the received sound signals obtained from the signals received by the i-th microphone 100 i and the j-th microphone 100 j is

Figure 0004298466
Figure 0004298466

とする。受音信号間遅延時間差は、信号間の相互相関を求め、その最大ピーク位置から求めることができる。次に、m番目の受音位置を(xm,ym,zm)、推定音源位置を And The delay time difference between the received sound signals can be obtained from the maximum peak position by obtaining the cross-correlation between the signals. Next, the mth sound receiving position is (x m , y m , z m ), and the estimated sound source position is

Figure 0004298466
Figure 0004298466

と表す。これらの位置から求められる推定受音信号間遅延時間差 It expresses. Delay time difference between estimated received sound signals obtained from these positions

Figure 0004298466
Figure 0004298466

は、式(1)で表される。 Is represented by Formula (1).

Figure 0004298466
Figure 0004298466

次に、受音信号間遅延時間差 Next, the delay time difference between the received signals

Figure 0004298466
Figure 0004298466

に音速cを乗じ距離に換算したものを、それぞれ受音位置間距離差 Multiplied by the speed of sound c and converted into a distance,

Figure 0004298466
Figure 0004298466

とし、測定値 And measured value

Figure 0004298466
Figure 0004298466

と推定値 And estimated value

Figure 0004298466
Figure 0004298466

の二乗平均誤差 Mean square error of

Figure 0004298466
Figure 0004298466

を求めれば、式(2)となる。 Is obtained, Equation (2) is obtained.

Figure 0004298466
Figure 0004298466

式(2)の二乗平均誤差   Mean square error of equation (2)

Figure 0004298466
Figure 0004298466

を最小化する解を求めれば、受音信号間遅延時間差の測定値と推定値の誤差が最小となる推定音源位置を求めることができる。ただし、式(2)は非線形連立方程式となっており、解析的に解くことは困難であるので、逐次修正を用いた数値解析により求める。 If the solution for minimizing the difference is obtained, the estimated sound source position at which the error between the measured value and the estimated value of the delay time difference between the received sound signals is minimized can be obtained. However, since Equation (2) is a nonlinear simultaneous equation and difficult to solve analytically, it is obtained by numerical analysis using sequential correction.

式(2)を最小化する推定音源位置   Estimated sound source position that minimizes Equation (2)

Figure 0004298466
Figure 0004298466

を求めるには、ある点における勾配を求め、誤差が小さくなる方向に推定音源位置を修正していき、勾配が0となる点を求めればよいので、修正式は式(3)のようになる。 Is obtained by calculating the gradient at a certain point, correcting the estimated sound source position in the direction in which the error becomes smaller, and determining the point at which the gradient becomes 0. The correction formula is as shown in equation (3). .

Figure 0004298466
Figure 0004298466

以上、式(3)を繰返し計算することで、誤差が最小となる推定音源位置を求めることができる。 As described above, the estimated sound source position where the error is minimized can be obtained by repeatedly calculating Expression (3).

次に、共分散行列計算部104では、マイクロホン受音信号の共分散を求め、それを行列にする。まず、マイクロホン受音信号の周波数領域変換信号をX1(ω)〜XM(ω)とする。これれらの信号の共分散行列 Next, the covariance matrix calculation unit 104 obtains the covariance of the microphone sound reception signal and makes it a matrix. First, let the frequency domain conversion signal of the microphone received signal be X 1 (ω) to X M (ω). The covariance matrix of these signals

Figure 0004298466
Figure 0004298466

は、式(9)により算出される。 Is calculated by equation (9).

Figure 0004298466
Figure 0004298466

次に、共分散行列記憶部106では、話者位置検出部105の検出結果に基づき、共分散行列   Next, in the covariance matrix storage unit 106, the covariance matrix is based on the detection result of the speaker position detection unit 105.

Figure 0004298466
Figure 0004298466

を、各音源位置に対する共分散行列 Is the covariance matrix for each source location

Figure 0004298466
Figure 0004298466

として保存する。 Save as.

Lチャネルフィルタ係数算出部107LとRチャネルフィルタ係数算出部107Rは、各話者から発せられた音を所望のレベル差で収音するためのフィルタ係数を計算する。まず、各マイクロホンに接続されたLチャネルフィルタ102L1〜102LMとRチャネルフィルタ102R1〜102RMのフィルタ係数を周波数領域に変換したものを、それぞれHL,1(ω)〜HL,M(ω)とHR,1(ω)〜HR,M(ω)とする。次に、これらのフィルタ係数を式(10)と式(11)により行列としたものを The L channel filter coefficient calculation unit 107 L and the R channel filter coefficient calculation unit 107 R calculate filter coefficients for collecting sounds emitted from the speakers with a desired level difference. First, the filter coefficients of the L channel filters 102 L1 to 102 LM and the R channel filters 102 R1 to 102 RM connected to the respective microphones are converted to the frequency domain, respectively, H L, 1 (ω) to H L, M (Ω) and H R, 1 (ω) to H R, M (ω). Next, these filter coefficients are converted into a matrix according to equations (10) and (11).

Figure 0004298466
Figure 0004298466

とする。 And

Figure 0004298466
Figure 0004298466

また、i番目音源が発音している期間のマイクロホン受音信号の周波数領域変換信号をXSi,1(ω)〜XSi,M(ω)とする。 Also, let X Si, 1 (ω) to X Si, M (ω) be the frequency domain transform signal of the microphone sound reception signal during the period when the i-th sound source is sounding.

ここで、フィルタ係数行列   Where the filter coefficient matrix

Figure 0004298466
Figure 0004298466

に要求される条件は、i番目話者のマイクロホン受音信号XSi,1(ω)〜XSi,M(ω)をフィルタ係数行列 The required condition is that the microphone sound signal X Si, 1 (ω) to X Si, M (ω) of the i-th speaker is a filter coefficient matrix

Figure 0004298466
Figure 0004298466

でそれぞれフィルタリングし、フィルタリング後の信号をチャネルごとに加算したときに、LチャネルとRチャネルの各話者音声信号がi番目話者位置(Xi,Yi,Zi)に対応した所望のレベル差PDiff(Xi,Yi,Zi)(良好な音像定位を実現するレベル差であり、あらかじめ話者位置ごとに設定される)となっていることである。したがって、各音源の信号をそれぞれフィルタリングおよび加算した信号が所望のレベル差PDiff(Xi,Yi,Zi)となるようにM行のミキシング係数行列 When the filtered signals are added for each channel, the L-channel and R-channel speaker audio signals correspond to the i-th speaker positions (X i , Y i , Z i ). This is a level difference P Diff (X i , Y i , Z i ) (a level difference that realizes a good sound image localization and is set in advance for each speaker position). Therefore, the mixing coefficient matrix of M rows so that the signal obtained by filtering and adding the signals of the respective sound sources has a desired level difference P Diff (X i , Y i , Z i ).

Figure 0004298466
Figure 0004298466

をマイクロホン受音信号にそれぞれ乗じた信号となる式(12)と式(13)が理想条件となる。 Equations (12) and (13), which are signals obtained by multiplying the microphone sound reception signal, respectively, are ideal conditions.

Figure 0004298466
Figure 0004298466

所望のレベル差PDiff(Xi,Yi,Zi)を実現するミキシング係数行列 Mixing coefficient matrix realizing desired level difference P Diff (X i , Y i , Z i )

Figure 0004298466
Figure 0004298466

は、Lチャネルミキシング係数設定部203LとRチャネルミキシング係数設定部203Rで話者位置ごとにあらかじめ設定されている。例えば以下に述べるように設定される。図9は本発明において実現する指向性例を示した図である。図9に示すマイクロホンと話者の配置では、マイクロホンから見た開き角が小さい位置に複数の話者が存在する。このような場合、従来技術のステレオマイクロホンでは、ほとんど音像の定位感を得ることはできない。そこで、本発明では音像の定位感を強調する指向性を形成する。例えば、話者位置3に対しては、Lチャネルのレベルを大きくし、Rチャネルのレベルを小さくして、大きなレベル差が付くようにする。このときミキシング係数行列は It is previously set for each speaker position L channel mixing coefficient setting unit 203 L and R channel mixing coefficient setting unit 203 R. For example, it is set as described below. FIG. 9 is a diagram showing an example of directivity realized in the present invention. In the arrangement of the microphone and the speaker shown in FIG. 9, there are a plurality of speakers at positions where the opening angle viewed from the microphone is small. In such a case, the stereo microphone of the prior art can hardly obtain a sense of localization of the sound image. Therefore, the present invention forms directivity that emphasizes the sense of localization of the sound image. For example, for speaker position 3, the L channel level is increased and the R channel level is decreased so that a large level difference is provided. At this time, the mixing coefficient matrix is

Figure 0004298466
Figure 0004298466

のように設定する。このようにすることで、話者位置3からの発話に対して20dBのレベル差を得ることができる。他の話者位置に対しても同様にミキシング係数行列を設定すれば、図9に示すような指向性が得られ、良好な音像定位を実現できる。 Set as follows. In this way, a level difference of 20 dB can be obtained for the utterance from the speaker position 3. If the mixing coefficient matrix is similarly set for other speaker positions, the directivity as shown in FIG. 9 can be obtained and good sound image localization can be realized.

次に、式(12)と式(13)の条件をフィルタ係数行列   Next, the conditions of the equations (12) and (13) are changed to the filter coefficient matrix.

Figure 0004298466
Figure 0004298466

について最小二乗解で解けば、式(14)と式(15)となる。ただし、CSiは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。 Is solved by least squares solution, Equations (14) and (15) are obtained. However, C Si is a constant of weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value increases.

Figure 0004298466
Figure 0004298466

以上で、所望のレベル差を得るためのフィルタ係数を求める式(14)と式(15)を導出した。   Thus, the equations (14) and (15) for obtaining the filter coefficient for obtaining the desired level difference are derived.

次に、式(14)と式(15)により求められた、Lチャネルフィルタ係数   Next, the L channel filter coefficient obtained by the equations (14) and (15)

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ係数 And R channel filter coefficients

Figure 0004298466
Figure 0004298466

は、Lチャネルフィルタ102L1〜102LMとRチャネルフィルタ102R1〜102RMにそれぞれコピーされ、マイクロホン受音信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器103Lと103Rで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 L1 to 102 LM and the R channel filters 102 R1 to 102 RM , respectively, and respectively filter the microphone reception signals. The filtered signals are added by adders 103 L and 103 R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から話者位置を検出し、共分散行列を求め、話者位置ごとに所望のLRチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。   As described above, in this embodiment, the speaker position is detected from the received signals of a plurality of microphones, a covariance matrix is obtained, and a filter coefficient that gives a desired level difference between LR channels is obtained for each speaker position. By filtering the microphone sound reception signal for each channel using these filter coefficients, a stereo output signal having a desired level difference for each speaker position can be obtained.

[第2の実施形態]
本発明の第2の実施形態の収音装置について説明する。本実施形態のブロック図は、第1の実施形態と同じ図1である。本実施形態は、第1の実施形態の収音装置に雑音抑圧機能を加えたものである。
[Second Embodiment]
A sound collection device according to a second embodiment of the present invention will be described. The block diagram of this embodiment is FIG. 1 which is the same as that of the first embodiment. In the present embodiment, a noise suppression function is added to the sound collection device of the first embodiment.

本実施形態の収音装置は、マイクロホン1011〜101Mと、Lチャネルフィルタ102L1〜102LMと、Rチャネルフィルタ102R1〜102RMと、Lチャネル加算器103Lと、Rチャネル加算器103Rと、話者位置検出部105と、共分散行列計算部104と、共分散行列記憶部106と、Lチャネルフィルタ係数計算部107Lと、Rチャネルフィルタ係数計算部107Rと、Lチャネルミキシング係数設定部203Lと、Rチャネルミキシング係数設定部203Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 1 to 101 M , L channel filters 102 L1 to 102 LM , R channel filters 102 R1 to 102 RM , L channel adder 103 L , and R channel adder 103. R , speaker position detection unit 105, covariance matrix calculation unit 104, covariance matrix storage unit 106, L channel filter coefficient calculation unit 107 L , R channel filter coefficient calculation unit 107 R , and L channel mixing A coefficient setting unit 203 L and an R channel mixing coefficient setting unit 203 R are included.

本実施形態は、話者位置および雑音区間を推定し、各話者位置に対する共分散行列と雑音に対する共分散行列を保存しておき、これらを用いてLチャネルとRチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Lチャネル出力信号とRチャネル出力信号を得る。これにより、雑音の抑圧と、各話者からの音声信号がLチャネルとRチャネルでレベル差を持った、良好な音像の定位が実現する。   In this embodiment, the speaker position and the noise interval are estimated, a covariance matrix for each speaker position and a covariance matrix for noise are stored, and L channel and R channel filter coefficients are obtained using these, and these are calculated. Each of the microphone sound reception signals is filtered and added with the filter coefficient of L, and an L channel output signal and an R channel output signal are obtained. As a result, noise suppression and good sound image localization in which the sound signal from each speaker has a level difference between the L channel and the R channel are realized.

まず、話者位置検出部105は、マイクロホン1011〜101Mで受音したマイクロホン受音信号のパワーから雑音区間と発話区間を検出する。例えば、それぞれのマイクロホン受音信号について、短時間平均パワー(0.1〜1s程度)と、長時間平均パワー(1s〜100s程度)を求め、短時間平均パワーと長時間平均パワーの比が雑音区間の閾値未満の場合に雑音区間と判定し、発話の閾値以上の場合に発話区間と判定する。発話区間と判定された場合は、第1の実施形態と同様にして、話者位置を検出する。 First, the speaker position detection unit 105 detects a noise section and a speech section from the power of a microphone sound reception signal received by the microphones 101 1 to 101 M. For example, for each microphone reception signal, a short time average power (about 0.1 to 1 s) and a long time average power (about 1 s to 100 s) are obtained, and the ratio of the short time average power and the long time average power is noise. When it is less than the threshold of the section, it is determined as a noise section, and when it is equal to or more than the threshold of utterance, it is determined as an utterance section. If it is determined that the utterance section, the speaker position is detected as in the first embodiment.

次に、共分散行列計算部104は、第1の実施形態と同様にして、共分散行列   Next, the covariance matrix calculation unit 104 performs the covariance matrix in the same manner as in the first embodiment.

Figure 0004298466
Figure 0004298466

を算出する。 Is calculated.

共分散行列記憶部108では、話者位置検出部105の検出結果に基づき、共分散行列   In the covariance matrix storage unit 108, the covariance matrix is based on the detection result of the speaker position detection unit 105.

Figure 0004298466
Figure 0004298466

を、各音源位置に対する共分散行列 Is the covariance matrix for each source location

Figure 0004298466
Figure 0004298466

と雑音の共分散行列 And noise covariance matrix

Figure 0004298466
Figure 0004298466

として保存する。 Save as.

Lチャネルフィルタ係数算出部107LとRチャネルフィルタ係数算出部107Rは、各話者から発せられた音を所望のレベル差で収音し、雑音を抑圧するためのフィルタ係数を計算する。ここで、各マイクロホンに接続されたLチャネルフィルタ The L channel filter coefficient calculation unit 107 L and the R channel filter coefficient calculation unit 107 R pick up sounds emitted from the speakers with a desired level difference, and calculate filter coefficients for suppressing noise. Here, L channel filter connected to each microphone

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ And R channel filter

Figure 0004298466
Figure 0004298466

に要求される条件は、以下の2つである。1つ目は、第1の実施形態の式(12)と式(13)に示したLチャネルとRチャネルの各話者音声信号がi番目の話者位置(Xi,Yi,Zi)に対応した所望のレベル差PDiff(Xi,Yi,Zi)となる条件である。2つ目は、雑音を抑圧する条件であり、マイクロホン受音信号の雑音成分XN,1(ω)〜XN,M(ω)がフィルタ部に入力された場合に各チャネルの出力が0となる式(16)と式(17)である。 The following two conditions are required. The first is that the speaker audio signals of the L channel and the R channel shown in the equations (12) and (13) of the first embodiment are the i-th speaker positions (X i , Y i , Z i). ), A desired level difference P Diff (X i , Y i , Z i ). The second condition is to suppress noise. When the noise components X N, 1 (ω) to X N, M (ω) of the microphone reception signal are input to the filter unit, the output of each channel is 0. (16) and (17).

Figure 0004298466
Figure 0004298466

次に、式(12)と式(16)、式(13)と式(17)の条件をフィルタ係数行列   Next, the conditions of Expression (12) and Expression (16), Expression (13), and Expression (17) are changed to the filter coefficient matrix.

Figure 0004298466
Figure 0004298466

について最小二乗解で解けば、式(18)と式(19)となる。ただし、CSiは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。CNは雑音抑圧に対する重みの定数であり、値が大きくなるほど雑音抑圧量が増加する。 Is solved by the least squares solution, equations (18) and (19) are obtained. However, C Si is a constant of weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value increases. C N is a constant of the weight for noise suppression, and the noise suppression amount increases as the value increases.

Figure 0004298466
Figure 0004298466

以上で、雑音を抑圧し、所望のレベル差を得るためのフィルタ係数を求める式(18)と式(19)を導出した。   Thus, equations (18) and (19) for obtaining filter coefficients for suppressing noise and obtaining a desired level difference are derived.

次に、式(18)と式(19)により求められた、Lチャネルフィルタ係数   Next, the L channel filter coefficient obtained by Expression (18) and Expression (19)

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ係数 And R channel filter coefficients

Figure 0004298466
Figure 0004298466

は、Lチャネルフィルタ102L1〜102LMとRチャネルフィルタ102R1〜102RMにそれぞれコピーされ、マイクロホン受音信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器103Lと103Rで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 L1 to 102 LM and the R channel filters 102 R1 to 102 RM , respectively, and respectively filter the microphone reception signals. The filtered signals are added by adders 103 L and 103 R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から、話者位置と雑音区間を検出し、共分散行列を求め、雑音抑圧と話者位置ごとに所望のLRチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。   As described above, in this embodiment, speaker positions and noise intervals are detected from the received signals of a plurality of microphones, a covariance matrix is obtained, and noise suppression and the level between desired LR channels for each speaker position are obtained. By obtaining the filter coefficients that give the difference and filtering the microphone sound reception signal for each channel using those filter coefficients, it is possible to suppress noise and obtain a stereo output signal having a desired level difference for each speaker position. it can.

[第3の実施形態]
図2は本発明の第3の実施形態の収音装置のブロック図である。
[Third Embodiment]
FIG. 2 is a block diagram of a sound collecting apparatus according to the third embodiment of the present invention.

本実施形態の収音装置は、マイクロホン1011〜101Mと、Lチャネルフィルタ102L1〜102LMと、Rチャネルフィルタ102R1〜102RMと、Lチャネル加算器103Lと、Rチャネル加算器103Rと、送受話検出部201と、話者位置検出部105と、共分散行列計算部104と、共分散行列記憶部106と、Lチャネルフィルタ係数計算部107Lと、Rチャネルフィルタ係数計算部107Rと、Lチャネルミキシング係数設定部203Lと、Rチャネルミキシング係数設定部203Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 1 to 101 M , L channel filters 102 L1 to 102 LM , R channel filters 102 R1 to 102 RM , L channel adder 103 L , and R channel adder 103. R , transmission / reception detection unit 201, speaker position detection unit 105, covariance matrix calculation unit 104, covariance matrix storage unit 106, L channel filter coefficient calculation unit 107 L , and R channel filter coefficient calculation unit 107 R , an L channel mixing coefficient setting unit 203 L, and an R channel mixing coefficient setting unit 203 R.

本実施形態は、第1と第2のいずれかの実施形態の収音装置にエコー(マイクロホンにより収音された受話信号)抑圧機能を加えたものである。   In the present embodiment, an echo (received signal collected by a microphone) suppression function is added to the sound collection device of any one of the first and second embodiments.

本実施形態は、受話信号とマイクロホン受音信号から受話区間、送話区間、雑音区間を検出し、送話区間であった場合に話者位置を推定し、各話者位置に対する共分散行列と雑音の共分散行列とエコーの共分散行列を保存しておき、これらを用いてLチャネルとRチャネルフィルタ係数を求め、これらのフィルタ係数で、それぞれマイクロホン受音信号をフィルタリングし加算し、Lチャネル出力信号とRチャネル出力信号を得る。これにより、雑音とエコーが抑圧され、各話者からの音声信号がLチャネルとRチャネルでレベル差を持ち、高品質な音声での通話と良好な音像の定位が実現する。   In the present embodiment, a reception section, a transmission section, and a noise section are detected from the reception signal and the microphone reception signal, and the speaker position is estimated in the case of the transmission section, and the covariance matrix for each speaker position is The covariance matrix of noise and the covariance matrix of echo are stored, L channel and R channel filter coefficients are obtained using these, and the microphone sound reception signal is filtered and added with these filter coefficients, respectively. An output signal and an R channel output signal are obtained. As a result, noise and echo are suppressed, and the voice signal from each speaker has a level difference between the L channel and the R channel, thereby realizing a high quality voice call and good sound image localization.

まず、送受話検出部201は、マイクロホン1011〜101Mで受音したマイクロホン受音信号とLチャネルおよびRチャネルの受話信号のパワーから受話区間と発話区間と雑音区間を検出する。例えば、各チャネルの受話信号について、短時間平均パワー(0.1〜1s程度)と、長時間平均パワー(1s〜100s程度)を求め、短時間平均パワーと長時間平均パワーの比が受話区間の閾値以上だった場合に受話区間と判定する。また、それぞれのマイクロホン受音信号について、短時間平均パワー(0.1〜1s程度)と長時間平均パワー(1s〜100s程度)を求め、短時間平均パワーと長時間平均パワーの比が雑音区間の閾値未満の場合に雑音区間と判定し、発話の閾値以上の場合に発話区間と判定する。 First, the transmission / reception detecting unit 201 detects a reception section, a speech section, and a noise section from the microphone reception signals received by the microphones 101 1 to 101 M and the powers of the L channel and R channel reception signals. For example, for a received signal of each channel, a short time average power (about 0.1 to 1 s) and a long time average power (about 1 s to 100 s) are obtained, and a ratio between the short time average power and the long time average power is a reception interval. If it is equal to or greater than the threshold value, it is determined as the reception interval. Further, for each microphone reception signal, a short time average power (about 0.1 to 1 s) and a long time average power (about 1 s to 100 s) are obtained, and the ratio of the short time average power and the long time average power is a noise interval. If it is less than the threshold value, it is determined as a noise interval, and if it is greater than or equal to the utterance threshold value, it is determined as a speech interval.

話者位置検出部105は、送受話検出部201で発話区間と判定された場合に、第1の実施形態と同様にして、話者位置を検出する。   The speaker position detection unit 105 detects the speaker position in the same manner as in the first embodiment when the transmission / reception detection unit 201 determines that it is an utterance section.

次に、共分散行列計算部104は、第1の実施形態と同様にして、共分散行列   Next, the covariance matrix calculation unit 104 performs the covariance matrix in the same manner as in the first embodiment.

Figure 0004298466
Figure 0004298466

を算出する。 Is calculated.

共分散行列記憶部106は、送受話検出部201と話者位置検出部105の検出結果に基づき、共分散行列   The covariance matrix storage unit 106 is based on the detection results of the transmission / reception detection unit 201 and the speaker position detection unit 105, and uses a covariance matrix.

Figure 0004298466
Figure 0004298466

を、各音源位置に対する共分散行列 Is the covariance matrix for each source location

Figure 0004298466
Figure 0004298466

と雑音の共分散行列 And noise covariance matrix

Figure 0004298466
Figure 0004298466

とエコーの共分散行列 And echo covariance matrix

Figure 0004298466
Figure 0004298466

として保存する。 Save as.

Lチャネルフィルタ係数算出部107LとRチャネルフィルタ係数算出部107Rは、各話者から発せられた音を所望のレベル差で収音し、エコーと雑音を抑圧するためのフィルタ係数を計算する。ここで、各マイクロホンに接続されたLチャネルフィルタ The L channel filter coefficient calculation unit 107 L and the R channel filter coefficient calculation unit 107 R collect sounds emitted from each speaker with a desired level difference, and calculate filter coefficients for suppressing echo and noise. . Here, L channel filter connected to each microphone

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ And R channel filter

Figure 0004298466
Figure 0004298466

に要求される条件は、以下の3つである。1つ目は、第1の実施形態の式(12)と式(13)に示したLチャネルとRチャネルの各話者音声信号がi番目の話者位置(Xi,Yi,Zi)に対応した所望のレベル差PDiff(Xi,Yi,Zi)となる条件である。2つ目は、第2の実施形態の式(16)と式(17)で示した雑音を抑圧する条件である。3つ目の条件は、エコー成分を抑圧する条件であり、マイクロホン受音信号のエコー成分XE,1(ω)〜XE,M(ω)がフィルタ部に入力された場合に各チャネルの出力が0となる式(20)と式(21)である。 The following three conditions are required. The first is that the speaker audio signals of the L channel and the R channel shown in the equations (12) and (13) of the first embodiment are the i-th speaker positions (X i , Y i , Z i). ), A desired level difference P Diff (X i , Y i , Z i ). The second is a condition for suppressing the noise expressed by the equations (16) and (17) of the second embodiment. The third condition is a condition for suppressing the echo component. When the echo components X E, 1 (ω) to X E, M (ω) of the microphone reception signal are input to the filter unit, Expressions (20) and (21) in which the output is 0.

Figure 0004298466
Figure 0004298466

次に、式(12)と式(16)と式(20)、式(13)と式(17)と式(21)の条件をフィルタ係数行列   Next, the conditions of Expression (12), Expression (16), Expression (20), Expression (13), Expression (17), and Expression (21) are set as the filter coefficient matrix.

Figure 0004298466
Figure 0004298466

について最小二乗解で解けば、式(22)と式(23)となる。ただし、CSiは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。CNは雑音抑圧に対する重みの定数であり、値が大きくなるほど雑音抑圧量が増加する。CEはエコー抑圧に対する重みの定数であり、値が大きくなるほどエコー抑圧量が増加する。 Is solved by the least squares solution, equations (22) and (23) are obtained. However, C Si is a constant of weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value increases. C N is a constant of the weight for noise suppression, and the noise suppression amount increases as the value increases. CE is a constant of weight for echo suppression, and the echo suppression amount increases as the value increases.

Figure 0004298466
Figure 0004298466

以上で、雑音とエコーを抑圧し、所望のレベル差を得るためのフィルタ係数を求める式(22)と式(23)を導出した。   As described above, the expressions (22) and (23) for obtaining the filter coefficients for suppressing the noise and the echo and obtaining the desired level difference are derived.

次に、式(22)と式(23)により求められた、Lチャネルフィルタ係数   Next, the L channel filter coefficient obtained by Equation (22) and Equation (23)

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ係数 And R channel filter coefficients

Figure 0004298466
Figure 0004298466

は、Lチャネルフィルタ102L1〜102LMとRチャネルフィルタ102R1〜102RMにそれぞれコピーされ、マイクロホン受音信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器103Lと103Mで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 L1 to 102 LM and the R channel filters 102 R1 to 102 RM , respectively, and respectively filter the microphone reception signals. The filtered signals are added by adders 103 L and 103 M for each channel and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のLRチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。   As described above, in the present embodiment, the reception interval, the noise interval, and the transmission interval are detected from the reception signals of a plurality of microphones, the speaker position is detected in the transmission interval, a covariance matrix is obtained, and echo suppression is performed. The filter coefficient that gives the desired level difference between the LR channels for each speaker position and the noise suppression is obtained, and the microphone reception signal is filtered for each channel by these filter coefficients to suppress the echo and the noise. A stereo output signal having a desired level difference for each person position can be obtained.

[第4の実施形態]
図3は本発明の第4の実施形態の収音装置のブロック図である。
[Fourth Embodiment]
FIG. 3 is a block diagram of a sound collecting apparatus according to the fourth embodiment of the present invention.

本実施形態の収音装置は、マイクロホン1011〜101Mと、Lチャネルフィルタ102L1〜102LM、301LL、301LRと、Rチャネルフィルタ102R1〜102RM、301RL、301RRと、Lチャネル加算器103Lと、Rチャネル加算器103Rと、送受話検出部201と、話者位置検出部105と、共分散行列計算部104と、共分散行列記憶部106と、Lチャネルフィルタ係数計算部107Lと、Rチャネルフィルタ係数計算部107Rと、Lチャネルミキシング係数設定部203Lと、Rチャネルミキシング係数設定部203Rにより構成される。 The sound collection device of the present embodiment includes microphones 101 1 to 101 M , L channel filters 102 L1 to 102 LM , 301 LL and 301 LR , R channel filters 102 R1 to 102 RM , 301 RL and 301 RR , L Channel adder 103 L , R channel adder 103 R , transmission / reception detector 201, speaker position detector 105, covariance matrix calculator 104, covariance matrix storage 106, L channel filter coefficient The calculation unit 107 L , the R channel filter coefficient calculation unit 107 R , the L channel mixing coefficient setting unit 203 L, and the R channel mixing coefficient setting unit 203 R are configured.

本実施形態は、第3の実施形態の収音装置のLチャネルフィルタとRチャネルフィルタに受話信号をフィルタリングするフィルタ301LL、301LR、301RL、301RRを追加した構成であり、第3の実施形態よりもさらに高いエコー抑圧を実現する。 This embodiment is a configuration in which filters 301 LL , 301 LR , 301 RL , and 301 RR for filtering received signals are added to the L channel filter and the R channel filter of the sound collection device of the third embodiment. Echo suppression higher than that of the embodiment is realized.

まず、送受話検出部201は、第3の実施形態と同様に受話区間と発話区間と雑音区間を検出する。   First, the transmission / reception detecting unit 201 detects a reception section, a speech section, and a noise section as in the third embodiment.

話者位置検出部105は、送受話検出部201で発話区間と判定された場合に、第1の実施形態と同様にして、話者位置を検出する。   The speaker position detection unit 105 detects the speaker position in the same manner as in the first embodiment when the transmission / reception detection unit 201 determines that it is an utterance section.

次に、共分散行列計算部104は、受話信号も含めた共分散行列   Next, the covariance matrix calculation unit 104 includes a covariance matrix including the received signal.

Figure 0004298466
Figure 0004298466

を算出する。マイクロホン受音信号の周波数領域変換信号をX1(ω)〜XM(ω)とし、LチャネルとRチャネルの受話信号の周波数領域変換信号をそれぞれZL(ω)とZR(ω)とする。これらの信号の共分散行列 Is calculated. The frequency domain conversion signals of the microphone reception signal are X 1 (ω) to X M (ω), and the frequency domain conversion signals of the L channel and R channel reception signals are Z L (ω) and Z R (ω), respectively. To do. The covariance matrix of these signals

Figure 0004298466
Figure 0004298466

は、式(24)により算出される。 Is calculated by equation (24).

Figure 0004298466
Figure 0004298466

次に、共分散行列記憶部106は、送受話検出部201と話者位置検出部105の検出結果に基づき、共分散行列   Next, the covariance matrix storage unit 106 is based on the detection results of the transmission / reception detection unit 201 and the speaker position detection unit 105, and the covariance matrix

Figure 0004298466
Figure 0004298466

を、各音源位置に対する共分散行列 Is the covariance matrix for each source location

Figure 0004298466
Figure 0004298466

と雑音の共分散行列 And noise covariance matrix

Figure 0004298466
Figure 0004298466

とエコーの共分散行列 And echo covariance matrix

Figure 0004298466
Figure 0004298466

として保存する。 Save as.

次に、Lチャネルフィルタ係数算出部107LとRチャネルフィルタ係数算出部107Rは、各話者から発せられた音を所望のレベル差で収音し、エコーと雑音を抑圧するためのフィルタ係数を計算する。まず、各マイクロホンに接続されたLチャネルフィルタ102L1〜102LMとRチャネルフィルタ102R1〜102RMのフィルタ係数を周波数領域に変換したものを、それぞれHL,1(ω)〜HL,M(ω)とHR,1(ω)〜HR,M(ω)とし、LおよびRチャネル受話信号をフィルタリングするためのLチャネルフィルタをFL,L(ω)とFL,R(ω)とし、LおよびRチャネル受話信号をフィルタリングするためのRチャネルフィルタをFR,L(ω)とFR,R(ω)とする。次に、これらのフィルタ係数を式(25)と式(26)により行列としたものを Next, the L channel filter coefficient calculation unit 107 L and the R channel filter coefficient calculation unit 107 R pick up sounds emitted from the speakers with a desired level difference, and filter coefficients for suppressing echo and noise. Calculate First, the filter coefficients of the L channel filters 102 L1 to 102 LM and the R channel filters 102 R1 to 102 RM connected to the respective microphones are converted to the frequency domain, respectively, H L, 1 (ω) to H L, M (Ω) and H R, 1 (ω) to H R, M (ω), and the L channel filters for filtering the L and R channel received signals are F L, L (ω) and F L, R (ω ), And R channel filters for filtering the L and R channel reception signals are F R, L (ω) and F R, R (ω). Next, these filter coefficients are converted into a matrix according to equations (25) and (26).

Figure 0004298466
Figure 0004298466

とする。 And

Figure 0004298466
Figure 0004298466

ここで、Lチャネルフィルタ   Where L channel filter

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ And R channel filter

Figure 0004298466
Figure 0004298466

に要求される条件は、以下の3つである。 The following three conditions are required.

1つ目は、LチャネルとRチャネルの出力信号の各話者音声信号がi番目の話者位置(Xi,Yi,Zi)に対応した所望のレベル差PDiff(Xi,Yi,Zi)となる条件である。第1の実施形態と同様に、この条件は式(26)と式(27)で表される。 First, each speaker voice signal of the output signals of the L channel and the R channel has a desired level difference P Diff (X i , Y) corresponding to the i-th speaker position (X i , Y i , Z i ). i , Z i ). Similar to the first embodiment, this condition is expressed by Expression (26) and Expression (27).

Figure 0004298466
Figure 0004298466

ただし、ミキシング係数行列   However, the mixing coefficient matrix

Figure 0004298466
Figure 0004298466

は、1〜M番目の要素が第1の実施形態のミキシング係数行列 Where the 1st to Mth elements are the mixing coefficient matrix of the first embodiment.

Figure 0004298466
Figure 0004298466

と同様に設定され、M+1とM+2番目の要素が0であるM+2行1列の行列である。 Is an M + 2 × 1 matrix where the M + 1 and M + 2th elements are 0.

2つ目は、雑音を抑圧する条件である。マイクロホン受音信号のエコー成分XE,1(ω)〜XE,M(ω)がフィルタ部に入力された場合に各チャネルの出力が0となる式(28)と式(29)が、その条件となる。 The second is a condition for suppressing noise. When the echo components X E, 1 (ω) to X E, M (ω) of the microphone sound reception signal are input to the filter unit, Expressions (28) and (29) in which the output of each channel becomes 0 are as follows: This is the condition.

Figure 0004298466
Figure 0004298466

3つ目の条件は、エコー成分を抑圧する条件である。マイクロホン受音信号のエコー成分XE,1(ω)〜XE,M(ω)と受話信号ZL(ω)、ZR(ω)がフィルタ部に入力された場合に各チャネルの出力が0となる式(30)と式(31)がその条件となる。 The third condition is a condition for suppressing the echo component. When the echo components X E, 1 (ω) to X E, M (ω) of the microphone received signal and the received signals Z L (ω) and Z R (ω) are input to the filter unit, the output of each channel is Expressions (30) and (31) that become 0 are the conditions.

Figure 0004298466
Figure 0004298466

次に、式(26)と式(28)と式(30)、式(27)と式(29)と式(31)の条件をフィルタ係数行列   Next, the conditions of Expression (26), Expression (28), Expression (30), Expression (27), Expression (29), and Expression (31) are set as the filter coefficient matrix.

Figure 0004298466
Figure 0004298466

について最小二乗解で解けば、式(32)と式(33)となる。ただし、CSiは音源位置の感度拘束に対する重みの定数であり、値が大きくなるほど感度拘束が強くなる。CNは雑音抑圧に対する重みの定数であり、値が大きくなるほど雑音抑圧量が増加する。CEはエコー抑圧に対する重みの定数であり、値が大きくなるほどエコー抑圧量が増加する。 Is solved by least squares solution, equations (32) and (33) are obtained. However, C Si is a constant of the weight for the sensitivity constraint of the sound source position, and the sensitivity constraint becomes stronger as the value becomes larger. C N is a constant of the weight for noise suppression, and the noise suppression amount increases as the value increases. CE is a constant of weight for echo suppression, and the echo suppression amount increases as the value increases.

Figure 0004298466
Figure 0004298466

以上で、雑音とエコーを抑圧し、所望のレベル差を得るためのフィルタ係数を求める式(32)と式(33)を導出した。   Thus, the equations (32) and (33) for obtaining the filter coefficient for suppressing the noise and the echo and obtaining a desired level difference are derived.

次に、式(32)と式(33)により求められた、Lチャネルフィルタ係数   Next, the L channel filter coefficient obtained by Expression (32) and Expression (33)

Figure 0004298466
Figure 0004298466

とRチャネルフィルタ係数 And R channel filter coefficients

Figure 0004298466
Figure 0004298466

は、Lチャネルフィルタ102L1〜102LM、301LL、301LRと、Rチャネルフィルタ102R1〜102RM、301RL、301RRにそれぞれコピーされ、マイクロホン受音信号と受話信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器103Lと103Rで加算され、ステレオ出力信号として出力される。 Are copied to the L channel filters 102 L1 to 102 LM , 301 LL and 301 LR and the R channel filters 102 R1 to 102 RM , 301 RL and 301 RR , respectively, and filter the microphone sound reception signal and the reception signal, respectively. The filtered signals are added by adders 103 L and 103 R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数マイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のLRチャネル間のレベル差を与えるフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。また、第3の実施形態の収音装置に受話信号をフィルタリングするフィルタを追加したことにより、第3の実施形態の収音装置よりも高いエコー抑圧が実現する。   As described above, in this embodiment, the reception interval, the noise interval, and the transmission interval are detected from the reception signals of a plurality of microphones, the speaker position is detected in the transmission interval, a covariance matrix is obtained, and echo suppression is performed. The filter coefficient that gives the desired level difference between the LR channels for each speaker position and the noise suppression is obtained, and the microphone reception signal is filtered for each channel by these filter coefficients to suppress the echo and the noise. A stereo output signal having a desired level difference for each person position can be obtained. Further, by adding a filter for filtering the received signal to the sound collecting device of the third embodiment, higher echo suppression is realized than in the sound collecting device of the third embodiment.

[第5の実施形態]
図4は本発明の第5の実施形態の収音装置の要部のブロック図である。
[Fifth Embodiment]
FIG. 4 is a block diagram of a main part of a sound collecting apparatus according to the fifth embodiment of the present invention.

本実施形態の収音装置は、第1〜4の実施形態のいずれかの収音装置に話者音声レベル推定部108とゲイン算出部109を追加した構成である。話者音声レベル推定部108は、各話者の共分散行列より各話者の音声レベルを推定し、ゲイン算出部109は、各話者の音声レベルから、各話者の出力音声レベルが適正レベルとなるゲインを算出する。これにより、全ての話者の音声レベルを適正レベルとし、聞き取りやすい音量での収音を実現する。また、第1〜4の実施形態のいずれかの収音装置と同様に、エコー抑圧、雑音抑圧、チャネル間のレベル差をつけた良好な音像定位も同時に実現する。   The sound collection device of this embodiment has a configuration in which a speaker voice level estimation unit 108 and a gain calculation unit 109 are added to any of the sound collection devices of the first to fourth embodiments. The speaker voice level estimation unit 108 estimates the voice level of each speaker from the covariance matrix of each speaker, and the gain calculation unit 109 determines that the output voice level of each speaker is appropriate from the voice level of each speaker. Calculate the level gain. As a result, the sound level of all speakers is set to an appropriate level, and sound collection at a volume that is easy to hear is realized. In addition, as in any of the sound collection devices of the first to fourth embodiments, echo sound suppression, noise suppression, and good sound localization with a level difference between channels are realized at the same time.

まず、送受話検出部201、話者位置検出部105、共分散行列計算部104、共分散記憶部106については、第1から第4の実施形態のいずれかの収音装置と同様な処理となる。   First, the transmission / reception detection unit 201, the speaker position detection unit 105, the covariance matrix calculation unit 104, and the covariance storage unit 106 are processed in the same manner as the sound collection device of any of the first to fourth embodiments. Become.

話者音声レベル推定部108は、各話者の音声レベルPSiを、共分散行列記憶部106に記憶されている各話者位置に対する共分散行列 The speaker voice level estimation unit 108 uses the voice level P Si of each speaker as a covariance matrix for each speaker position stored in the covariance matrix storage unit 106.

Figure 0004298466
Figure 0004298466

と、話者位置ごとのシングルチャネルミキシング係数行列 And a single-channel mixing coefficient matrix for each speaker location

Figure 0004298466
Figure 0004298466

から式(34)または式(35)を用いて求められる。シングルチャネルミキシング係数行列 Is obtained using equation (34) or equation (35). Single channel mixing coefficient matrix

Figure 0004298466
Figure 0004298466

には、例えばミキシング係数行列 For example, a mixing coefficient matrix

Figure 0004298466
Figure 0004298466

を加算平均した行列、またはミキシング係数行列 Matrix that is the average of or mixing coefficient matrix

Figure 0004298466
Figure 0004298466

を加算平均した行列を用いる。 A matrix obtained by averaging is used.

Figure 0004298466
Figure 0004298466

次に、ゲイン算出部109は、各話者音声レベルPSiを適正レベルPopt(聞き取りやすいレベルで、あらかじめ設定される)にするためのゲインASiを算出する。ゲインASiは、式(36)により求めることができる。 Next, the gain calculation unit 109 calculates a gain A Si for setting each speaker voice level P Si to an appropriate level P opt (a level that is easy to hear and set in advance). Gain A Si can be obtained by equation (36).

Figure 0004298466
Figure 0004298466

Lチャネルフィルタ係数算出部107LとRチャネルフィルタ係数算出部107Rは、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とし、エコーと雑音を抑圧するためのフィルタ係数を計算する。エコーと雑音を抑圧する条件は、第1〜4の実施形態のいずれかの収音装置と同様である。各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とする条件は、式(37)と式(38)、または式(39)と式(40)で表される。 The L channel filter coefficient calculation unit 107 L and the R channel filter coefficient calculation unit 107 R pick up the sound emitted from each speaker at an appropriate level, and the sound emitted from each speaker is set to a desired level on the LR channel. The filter coefficient for suppressing the echo and the noise is calculated as the difference. The conditions for suppressing echo and noise are the same as those in any of the sound collection devices of the first to fourth embodiments. The conditions for picking up the sound emitted from each speaker at an appropriate level and setting the sound emitted from each speaker to a desired level difference in the LR channel are the expressions (37) and (38) or ( 39) and formula (40).

Figure 0004298466
Figure 0004298466

次に、第1〜4の実施形態のいずれかの収音装置のエコーと雑音を抑圧する条件と、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とする条件である式(37)と式(38)、または式(39)と式(40)を、フィルタ係数行列   Next, conditions for suppressing echo and noise of any of the sound collection devices of the first to fourth embodiments and sound emitted from each speaker were collected at an appropriate level and emitted from each speaker. Expression (37) and Expression (38), or Expression (39) and Expression (40), which are conditions for making the sound have a desired level difference in the LR channel, are expressed as a filter coefficient matrix.

Figure 0004298466
Figure 0004298466

または Or

Figure 0004298466
Figure 0004298466

について最小二乗解で解けば、式(41)と式(42)、または式(43)と式(44)、または式(45)と式(46)、または式(47)と式(48)となる。 Is solved by the least squares solution, Equation (41) and Equation (42), Equation (43) and Equation (44), Equation (45) and Equation (46), Equation (47) and Equation (48), It becomes.

Figure 0004298466
Figure 0004298466

ただし、式(41)と式(42)は各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とする条件で求めたフィルタ係数、式(43)と式(44)は各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とし、雑音を抑圧する条件で求めたフィルタ係数、式(45)と式(46)は、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とし、エコーと雑音を抑圧する条件で求めたフィルタ係数、式(47)と式(48)は、受話信号をフィルタリングするフィルタを持ち、各話者から発せられた音を適正レベルで収音し、各話者から発せられた音をLRチャネルで所望のレベル差とし、エコーと雑音を抑圧する条件で求めたフィルタ係数である。   However, Equation (41) and Equation (42) were obtained under the condition that the sound emitted from each speaker was picked up at an appropriate level and the sound emitted from each speaker was set to a desired level difference in the LR channel. Filter coefficients, Equation (43) and Equation (44), pick up the sound emitted from each speaker at an appropriate level, and make the sound emitted from each speaker a desired level difference in the LR channel to suppress noise. (45) and (46) are the filter coefficients obtained under the above conditions, the sound emitted from each speaker is picked up at an appropriate level, and the sound emitted from each speaker is obtained at a desired level on the LR channel. The filter coefficients obtained under the condition that the echo and noise are suppressed as the difference, Equations (47) and (48) have a filter for filtering the received signal, and the sound emitted from each speaker is collected at an appropriate level. The sound from each speaker to the desired level on the LR channel And to a filter coefficient calculated by the condition for suppressing the echo and noise.

次に、式(41)と式(42)、または式(43)と式(44)、または式(45)と式(46)、または式(47)と式(48)により求められた、Lチャネルフィルタ係数とRチャネルフィルタ係数は、Lチャネルフィルタ102L1〜102LM、301LL、301LRと、Rチャネルフィルタ102R1〜102RM、301RL、301RRにそれぞれコピーされ、マイクロホン受音信号と受話信号をそれぞれフィルタリングする。フィルタリング後の信号は、チャネルごとに加算器103Lと103Rで加算され、ステレオ出力信号として出力される。 Next, it was calculated | required by Formula (41) and Formula (42), Formula (43) and Formula (44), Formula (45) and Formula (46), or Formula (47) and Formula (48), L channel filter coefficients and the R-channel filter coefficients, and L the channel filter 102 L1 ~102 LM, 301 LL, 301 LR, respectively copied to the R channel filter 102 R1 ~102 RM, 301 RL, 301 RR, microphone received sound signal And the received signal are respectively filtered. The filtered signals are added by adders 103 L and 103 R for each channel, and output as a stereo output signal.

以上示したように、本実施形態では、複数のマイクロホンの受音信号から受話区間と雑音区間と送話区間を検出し、送話区間では話者位置を検出し、共分散行列を求め、エコー抑圧と雑音抑圧と話者位置ごとに所望のLRチャネル間のレベル差を与え、各話者から発せられた音を適正レベルで収音するフィルタ係数を求め、それらのフィルタ係数でマイクロホン受音信号をチャネルごとにフィルタリングすることで、エコーと雑音を抑圧し、各話者から発せられた音を適正レベルで収音し、話者位置ごとに所望のレベル差をもったステレオ出力信号を得ることができる。   As described above, in the present embodiment, the reception interval, the noise interval, and the transmission interval are detected from the reception signals of the plurality of microphones, the speaker position is detected in the transmission interval, the covariance matrix is obtained, and the echo is detected. A level difference between desired LR channels is given for each suppression, noise suppression, and speaker position, and a filter coefficient for collecting sounds emitted from each speaker at an appropriate level is obtained. By filtering each channel, the echo and noise are suppressed, the sound emitted from each speaker is collected at an appropriate level, and a stereo output signal having a desired level difference for each speaker position is obtained. Can do.

[第6の実施形態]
図5は本発明の第6の実施形態の収音装置の要部のブロック図である。
[Sixth Embodiment]
FIG. 5 is a block diagram of a main part of a sound collecting apparatus according to the sixth embodiment of the present invention.

本実施形態の収音装置は、第1〜5の実施形態のいずれかの収音装置に白色化部110を追加した構成である。白色化部110は、各共分散行列をLRチャネルフィルタ係数計算部107L,107Rの前段階で白色化する。これにより、マイクロホンで受音された信号の周波数特性に関与しないフィルタ係数が求められ、安定した動作が可能となる。 The sound collection device of this embodiment has a configuration in which a whitening unit 110 is added to any one of the sound collection devices of the first to fifth embodiments. Whitening unit 110, whitening in the preceding stage of each covariance matrix LR channel filter coefficient calculating section 107 L, 107 R. As a result, a filter coefficient that is not related to the frequency characteristic of the signal received by the microphone is obtained, and stable operation is possible.

白色化は、共分散行列   Whitening is the covariance matrix

Figure 0004298466
Figure 0004298466

の対角成分のうち最もパワーの大きいRkk(ω)を平均化する白色化ゲイン Whitening gain that averages R kk (ω) with the largest power among the diagonal components of

Figure 0004298466
Figure 0004298466

を乗算するか、共分散行列の対角成分の平均パワーを平均化する白色化ゲイン Or a whitening gain that averages the mean power of the diagonal components of the covariance matrix

Figure 0004298466
Figure 0004298466

を乗算することで行う。これらは、それぞれ式(49)または式(50)と、式(51)または式(52)により表される。 This is done by multiplying These are represented by the formula (49) or the formula (50) and the formula (51) or the formula (52), respectively.

Figure 0004298466
Figure 0004298466

ただし、βは白色化の度合いを調整する係数であり、1となれば完全な白色化となり、0となれば白色化は行われなくなる。   However, β is a coefficient for adjusting the degree of whitening. When it is 1, it becomes complete whitening, and when it becomes 0, whitening is not performed.

第5の実施形態の収音装置では、共分散行列の白色化により、音源の周波数特性に依存しないフィルタを求めることができる。これにより、音源の周波数特性が変化しても、フィルタ係数の変化がなく、本発明の処理による音色の変化を防ぐことができる。   In the sound collection device of the fifth embodiment, a filter independent of the frequency characteristics of the sound source can be obtained by whitening the covariance matrix. Thereby, even if the frequency characteristic of the sound source changes, there is no change in the filter coefficient, and the change in timbre due to the processing of the present invention can be prevented.

これら以外の部分に関しては、第1〜5の実施形態のいずれかの収音装置と同じであるので、説明を省略する。   Since other parts are the same as those of any of the sound collecting apparatuses of the first to fifth embodiments, the description thereof is omitted.

[第7〜9の実施形態]
図6〜図8はそれぞれ本発明の第7、第8、第9の実施形態の収音装置のブロック図である。
[Seventh to ninth embodiments]
6 to 8 are block diagrams of sound collecting apparatuses according to seventh, eighth and ninth embodiments of the present invention, respectively.

本実施形態の収音装置は、第1〜6の実施形態のいずれかの収音装置にFFT部4011〜401M、501L、501RとIFFT部402L、402Rを追加した構成である。FFT部4011〜401Mは、各マイクロホン受音信号をそれぞれFFTして、時間領域信号から周波数領域信号に変換する。FFT部501L、501Rは、LRチャネル受話信号をそれぞれFFTして、時間領域信号から周波数領域信号に変換する。IFFT部402L、402Rは、LRチャネル加算器103L、103Rの出力信号をそれぞれIFFTして周波数領域信号から時間領域信号に変換する。また、第1〜第6の実施形態の各処理部は全て周波数領域での計算とし、時間領域の演算に比べ低演算量を実現する。 The sound collection device of the present embodiment has a configuration in which FFT units 401 1 to 401 M , 501 L , and 501 R and IFFT units 402 L and 402 R are added to the sound collection device of any of the first to sixth embodiments. is there. The FFT units 401 1 to 401 M perform FFT on each microphone sound reception signal and convert the time domain signal into a frequency domain signal. The FFT units 501 L and 501 R each perform FFT on the LR channel reception signal and convert the time domain signal into the frequency domain signal. IFFT sections 402 L and 402 R perform IFFT on the output signals of LR channel adders 103 L and 103 R , respectively, and convert the frequency domain signals into time domain signals. Further, all the processing units of the first to sixth embodiments are calculated in the frequency domain, and realize a low calculation amount as compared with the calculation in the time domain.

これら以外の部分に関しては、第1〜6の実施形態のいずれかの収音装置と同じであるので、説明を省略する。   Since other parts are the same as those of any of the sound collecting apparatuses of the first to sixth embodiments, the description thereof is omitted.

[第10の実施形態]
本発明の第10の実施形態について説明する。
[Tenth embodiment]
A tenth embodiment of the present invention will be described.

本実施形態の収音装置は、第1〜9の実施形態のいずれかの収音装置の出力チャネル数を3チャネル以上のJチャネルとしたものである。LとRチャネルフィルタ係数計算部107L、107RとLとRチャネルフィルタ102L1〜102LM、301LL、301LR、102R1〜102RM、301RL、301RRとLとRチャネル加算器103L、103Rを、3チャネル以上の1〜Jチャネルフィルタ係数計算手段と、1〜Jチャネルフィルタ手段と、1〜Jチャネル加算手段に置き換えた構成である。 The sound collection device of the present embodiment is one in which the number of output channels of any of the sound collection devices of the first to ninth embodiments is a J channel of 3 channels or more. L and R channel filter coefficient calculation units 107 L and 107 R and L and R channel filters 102 L1 to 102 LM , 301 LL , 301 LR , 102 R1 to 102 RM , 301 RL , 301 RR and L and R channel adders 103 L, and 103 R, and more 1~J channel filter coefficient calculation means 3 channels, and 1~J channel filter unit, a configuration obtained by replacing the 1~J channel adding means.

出力チャネルを増加させたことにより、3チャネル以上の多チャンネルでの収音が可能となる。   By increasing the number of output channels, it is possible to collect sound on multiple channels of three or more channels.

これら以外の部分に関しては、第1〜9の実施形態のいずれかの収音装置と同じであるので、説明を省略する。   Since other parts are the same as those of any of the sound collecting apparatuses of the first to ninth embodiments, the description thereof is omitted.

なお、本発明は専用のハードウェアにより実現されるもの以外に、その機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。コンピュータ読み取り可能な記録媒体とは、フロッピーディスク、光磁気ディスク、CD−ROM等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置等の記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの(伝送媒体もしくは伝送波)、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含む。   In addition to what is implemented by dedicated hardware, the present invention records a program for realizing the function on a computer-readable recording medium, and the program recorded on the recording medium is stored in a computer system. It may be read and executed. The computer-readable recording medium refers to a recording medium such as a floppy disk, a magneto-optical disk, a CD-ROM, or a storage device such as a hard disk device built in the computer system. Furthermore, a computer-readable recording medium is a server that dynamically holds a program (transmission medium or transmission wave) for a short time, such as when transmitting a program via the Internet, and a server in that case. Some of them hold programs for a certain period of time, such as volatile memory inside computer systems.

本発明の第1および第2の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection apparatus of the 1st and 2nd embodiment of this invention. 本発明の第3の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 3rd Embodiment of this invention. 本発明の第4の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 4th Embodiment of this invention. 本発明の第5の実施形態の収音装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the sound collection device of the 5th Embodiment of this invention. 本発明の第6の実施形態の収音装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the sound collection device of the 6th Embodiment of this invention. 本発明の第7の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 7th Embodiment of this invention. 本発明の第8の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 8th Embodiment of this invention. 本発明の第9の実施形態の収音装置を示すブロック図である。It is a block diagram which shows the sound collection device of the 9th Embodiment of this invention. 本発明により形成される指向性を示す図である。It is a figure which shows the directivity formed by this invention. 従来の収音方法を説明する図である。It is a figure explaining the conventional sound collection method.

符号の説明Explanation of symbols

1011〜101M マイクロホン
102L1〜102LM、301LL、301LR Lチャネルフィルタ
102R1〜102RM、301RL、301RR Rチャネルフィルタ
103L Lチャネル加算器
103R Rチャネル加算器
104 共分散行列計算部
105 話者位置検出部
106 共分散行列記憶部
107L Lチャネルフィルタ係数計算部
107R Rチャネルフィルタ係数計算部
108 話者音声レベル推定部
109 ゲイン計算部
110 白色化部
201 送受話検出部
202L Lチャネルスピーカ
202R Rチャネルスピーカ
203L Lチャネルミキシング係数設定部
203R Rチャネルミキシング係数設定部
4011〜401M、501L、501R FFT
402L、402R IFFT
901L、901R 従来技術の指向性マイクロホン
902L、902R 本発明により形成される指向特性
903 本発明の処理
101 1 to 101 M microphones 102 L1 to 102 LM , 301 LL , 301 LR L channel filters 102 R1 to 102 RM , 301 RL , 301 RR R channel filters 103 L L channel adders 103 R R channel adders 104 covariance matrix Calculation unit 105 Speaker position detection unit 106 Covariance matrix storage unit 107 L L channel filter coefficient calculation unit 107 R R channel filter coefficient calculation unit 108 Speaker voice level estimation unit 109 Gain calculation unit 110 Whitening unit 201 Transmission / reception detection unit 202 L L channel speaker 202 R R channel speaker 203 L L channel mixing coefficient setting unit 203 R R channel mixing coefficient setting unit 401 1 to 401 M , 501 L , 501 R FFT
402 L , 402 R IFFT
901 L , 901 R Conventional directional microphones 902 L , 902 R Directional characteristics formed by the present invention 903 Processing of the present invention

Claims (18)

収音方法であって、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階と
を有する収音方法。
A sound collection method,
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the L channel mixing coefficient under the condition that each speaker voice component received by each of the plurality of sound pickup means is mixed by the L channel mixing coefficient corresponding to each speaker position. Calculating the L channel filter coefficient from the L channel filter coefficient calculating stage;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the R channel mixing coefficient under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed by the R channel mixing coefficient corresponding to each speaker position. Calculating an R channel filter coefficient from the R channel filter coefficient calculating stage;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.
収音方法であって、
複数の収音手段の各々で受音された受音信号から話者位置と雑音区間を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階と
を有する収音方法。
A sound collection method,
A speaker position detection stage for detecting a speaker position and a noise section from a received signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each noise interval and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position and the noise component is suppressed. And an L channel filter coefficient calculation step of calculating an L channel filter coefficient from the L channel mixing coefficient,
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, and the noise component is suppressed. And an R channel filter coefficient calculating step of calculating an R channel filter coefficient from the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.
収音方法であって、
複数の収音手段の各々で受音された受音信号と、通信相手からのLチャネル受話信号とRチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出段階と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階と
を有する収音方法。
A sound collection method,
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculation stage for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collection means;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting stage for presetting an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the L channel filter coefficient;
An R channel filter stage for filtering each received signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.
収音方法であって、
複数の収音手段の各々で受音された受音信号と、通信相手からのLチャネル受話信号とRチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出段階と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出段階と、
複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号から共分散行列を計算する共分散行列計算段階と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶段階と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算段階と、
あらかじめ各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定段階と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算段階と、
前記複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ段階と、
前記複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ段階と、
前記Lチャネルフィルタ段階の出力信号を加算するLチャネル加算段階と、
前記Rチャネルフィルタ段階の出力信号を加算するRチャネル加算段階と
を有する収音方法。
A sound collection method,
A transmission / reception detection stage for detecting a transmission interval, a reception interval, and a noise interval from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
A speaker position detection stage for detecting a speaker position from a received sound signal received by each of a plurality of sound pickup means;
A covariance matrix calculating step of calculating a covariance matrix from the received sound signal received by each of the plurality of sound collecting means, the L channel received signal, and the R channel received signal;
Storing a covariance matrix for each reception interval, noise interval, and speaker position;
An L channel mixing coefficient setting step for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. Calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
An R channel mixing coefficient setting step for setting in advance an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. An R channel filter coefficient calculation step of calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
An L channel filter stage for filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the L channel filter coefficient;
An R channel filter step of filtering a received sound signal, an L channel received signal, and an R channel received signal received by each of the plurality of sound collecting means, respectively, with the R channel filter coefficient;
An L channel addition stage for adding the output signals of the L channel filter stage;
And an R channel addition step of adding the output signals of the R channel filter step.
前記記憶された各話者の共分散行列から各話者の音声レベルを推定する話者音声レベル推定段階と、
前記各話者の音声レベルから、各話者音声が適正レベルで出力されるための各話者に対するゲインを各々算出するゲイン算出部とをさらに有し、
前記Lチャネルフィルタ係数計算段階は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出し、
前記Rチャネルフィルタ係数計算段階は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出する、
請求項1から4のいずれかに記載の収音方法。
A speaker voice level estimating step for estimating a voice level of each speaker from the stored covariance matrix of each speaker;
A gain calculating unit for calculating a gain for each speaker for outputting each speaker's voice at an appropriate level from the voice level of each speaker;
In the L channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed. From the stored covariance matrix and the L channel mixing coefficient, L channel filter coefficient is calculated,
In the R channel filter coefficient calculation step, the gain for each speaker is further multiplied, the received signal component is suppressed, and the noise component is suppressed, from the stored covariance matrix and the R channel mixing coefficient. Calculating R channel filter coefficients;
The sound collection method according to claim 1.
前記記憶された共分散行列のうち対角成分で最もパワーの大きい成分、または前記記憶された共分散行列の対角成分の加算値の周波数特性を平滑化するゲインを、前記記憶された共分散行列に乗算し、白色化された共分散行列を、前記Lチャネルフィルタ係数計算段階と前記Rチャネルフィルタ係数計算段階に入力する白色化段階をさらに有する、請求項1から5のいずれかに記載の収音方法。   The stored covariance is a gain for smoothing the frequency characteristic of the diagonal component of the stored covariance matrix having the highest power or the sum of the diagonal components of the stored covariance matrix. 6. The whitening step according to claim 1, further comprising a whitening step of multiplying a matrix and inputting a whitened covariance matrix to the L channel filter coefficient calculation step and the R channel filter coefficient calculation step. Sound collection method. 前記複数の収音手段の各々で受音された信号および前記受話信号を時間領域信号から周波数領域信号に変換するFFT段階と、
前記Lチャネル加算段階と前記Rチャネル加算段階の出力信号を周波数領域信号から時間領域信号に変換するIFFT段階をさらに有し、
前記各段階は周波数領域で演算する、
請求項1から6のいずれかに記載の収音方法。
An FFT stage for converting a signal received by each of the plurality of sound pickup means and the received signal from a time domain signal to a frequency domain signal;
An IFFT stage for converting the output signal of the L channel addition stage and the R channel addition stage from a frequency domain signal to a time domain signal;
Each of the above steps is performed in the frequency domain,
The sound collection method according to claim 1.
前記LおよびRチャネルフィルタ係数計算段階と前記LおよびRチャネルフィルタ段階と前記LおよびRチャネル加算段階を、3チャネル以上の1〜Jチャネルフィルタ係数計算段階と1〜Jチャネルフィルタ段階と1〜Jチャネル加算段階に置き換えた、
請求項1から7のいずれかに記載の収音方法。
The L and R channel filter coefficient calculation stage, the L and R channel filter stage, and the L and R channel addition stage are divided into three or more channels of 1 to J channel filter coefficient calculation stage, 1 to J channel filter stage, and 1 to J. Replaced with the channel addition stage,
The sound collection method according to claim 1.
収音装置であって、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出手段と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算手段と、
前記共分散行列を話者位置ごとに記憶する共分散行列記憶手段と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算手段と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされる条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算手段と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ手段と、
前記Lチャネル複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ手段と、
前記Lチャネルフィルタ手段の出力信号を加算するLチャネル加算手段と、
前記Rチャネルフィルタ手段の出力信号を加算するRチャネル加算手段と
を有する収音装置。
A sound collecting device,
Speaker position detecting means for detecting a speaker position from a received sound signal received by each of a plurality of sound collecting means;
A covariance matrix calculating means for calculating a covariance matrix from the received sound signals received by each of the plurality of sound collecting means;
Covariance matrix storage means for storing the covariance matrix for each speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the L channel mixing coefficient under the condition that each speaker voice component received by each of the plurality of sound pickup means is mixed by the L channel mixing coefficient corresponding to each speaker position. L channel filter coefficient calculating means for calculating L channel filter coefficients from:
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix and the R channel mixing coefficient under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed by the R channel mixing coefficient corresponding to each speaker position. R channel filter coefficient calculating means for calculating R channel filter coefficients from
L channel filter means for filtering received sound signals received by each of the plurality of sound collection means, respectively, with the L channel filter coefficients;
R channel filter means for filtering the received sound signal received by each of the L channel plural sound collecting means with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.
収音装置であって、
複数の収音手段の各々で受音された受音信号から話者位置と雑音区間を検出する話者位置検出手段と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算手段と、
前記共分散行列を雑音区間と話者位置ごとに記憶する共分散行列記憶手段と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算手段と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算手段と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ手段と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ手段と、
前記Lチャネルフィルタ手段の出力信号を加算するLチャネル加算手段と、
前記Rチャネルフィルタ手段の出力信号を加算するRチャネル加算手段と
を有する収音装置。
A sound collecting device,
Speaker position detecting means for detecting a speaker position and a noise section from a received sound signal received by each of a plurality of sound collecting means;
A covariance matrix calculating means for calculating a covariance matrix from the received sound signals received by each of the plurality of sound collecting means;
Covariance matrix storage means for storing the covariance matrix for each noise interval and speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position and the noise component is suppressed. And L channel filter coefficient calculation means for calculating an L channel filter coefficient from the L channel mixing coefficient,
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
The stored covariance matrix under the condition that each speaker speech component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, and the noise component is suppressed. And R channel filter coefficient calculating means for calculating an R channel filter coefficient from the R channel mixing coefficient,
L channel filter means for filtering received sound signals received by each of the plurality of sound collection means, respectively, with the L channel filter coefficients;
R channel filter means for filtering the received sound signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.
収音装置であって、
複数の収音手段の各々で受音された受音信号と、通信相手からのLチャネル受話信号とRチャネル受話信号から送話区間、受話区間、雑音区間を検出する送受話検出手段と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出手段と、
複数の収音手段の各々で受音された受音信号から共分散行列を計算する共分散行列計算手段と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶手段と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算手段と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算手段と、
前記複数の収音手段の各々で受音された受音信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ手段と、
前記複数の収音手段の各々で受音された受音信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ手段と、
前記Lチャネルフィルタ手段の出力信号を加算するLチャネル加算手段と、
前記Rチャネルフィルタ手段の出力信号を加算するRチャネル加算手段とを有する収音装置。
A sound collecting device,
A transmission / reception detecting means for detecting a transmission section, a reception section, and a noise section from a reception signal received by each of a plurality of sound collection means, an L channel reception signal and an R channel reception signal from a communication partner;
Speaker position detecting means for detecting a speaker position from a received sound signal received by each of a plurality of sound collecting means;
A covariance matrix calculating means for calculating a covariance matrix from the received sound signals received by each of the plurality of sound collecting means;
Covariance matrix storage means for storing the covariance matrix for each reception interval, noise interval, and speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. L channel filter coefficient calculation means for calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. R channel filter coefficient calculating means for calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
L channel filter means for filtering received sound signals received by each of the plurality of sound collection means, respectively, with the L channel filter coefficients;
R channel filter means for filtering the received sound signal received by each of the plurality of sound collecting means with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.
収音装置であって、
複数の収音手段の各々で受音された受音信号と、通信相手からのLチャネル受話信号とRチャネル受話信号から、送話区間、受話区間、雑音区間を検出する送受話検出手段と、
複数の収音手段の各々で受音された受音信号から話者位置を検出する話者位置検出手段と、
複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号から共分散行列を計算する共分散行列計算手段と、
前記共分散行列を受話区間と雑音区間と話者位置ごとに記憶する共分散行列記憶手段と、
各話者位置に対応するLチャネルミキシング係数をあらかじめ設定するLチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Lチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出するLチャネルフィルタ係数計算手段と、
各話者位置に対応するRチャネルミキシング係数をあらかじめ設定するRチャネルミキシング係数設定手段と、
複数の収音手段の各々で受音された各話者音声成分が各話者位置に対応する前記Rチャネルミキシング係数でミキシングされ、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出するRチャネルフィルタ係数計算手段と、
前記複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号を、前記Lチャネルフィルタ係数で各々フィルタリングするLチャネルフィルタ手段と、
前記複数の収音手段の各々で受音された受音信号とLチャネル受話信号とRチャネル受話信号を、前記Rチャネルフィルタ係数で各々フィルタリングするRチャネルフィルタ手段と、
前記Lチャネルフィルタ手段の出力信号を加算するLチャネル加算手段と、
前記Rチャネルフィルタ手段の出力信号を加算するRチャネル加算手段とを有する収音装置。
A sound collecting device,
A transmission / reception detecting means for detecting a transmission interval, a reception interval, and a noise interval from a received sound signal received by each of a plurality of sound collection means, and an L channel reception signal and an R channel reception signal from a communication partner;
Speaker position detecting means for detecting a speaker position from a received sound signal received by each of a plurality of sound collecting means;
Covariance matrix calculating means for calculating a covariance matrix from the received sound signal received by each of the plurality of sound collecting means, the L channel received signal, and the R channel received signal;
Covariance matrix storage means for storing the covariance matrix for each reception interval, noise interval, and speaker position;
L channel mixing coefficient setting means for presetting an L channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the L channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. L channel filter coefficient calculating means for calculating an L channel filter coefficient from the stored covariance matrix and the L channel mixing coefficient;
R channel mixing coefficient setting means for setting in advance an R channel mixing coefficient corresponding to each speaker position;
Each speaker voice component received by each of the plurality of sound pickup means is mixed with the R channel mixing coefficient corresponding to each speaker position, the received signal component is suppressed, and the noise component is suppressed. R channel filter coefficient calculating means for calculating an R channel filter coefficient from the stored covariance matrix and the R channel mixing coefficient;
L channel filter means for filtering the received sound signal, the L channel received signal, and the R channel received signal received by each of the plurality of sound collecting means, respectively, with the L channel filter coefficient;
R channel filter means for filtering the received sound signal, the L channel received signal, and the R channel received signal received by each of the plurality of sound collecting means, respectively, with the R channel filter coefficient;
L channel addition means for adding the output signals of the L channel filter means;
A sound collecting device comprising: R channel adding means for adding the output signals of the R channel filter means.
前記記憶された各話者の共分散行列から各話者の音声レベルを推定する話者音声レベル推定手段と、
前記各話者の音声レベルから、各話者音声が適正レベルで出力されるための各話者に対するゲインを各々算出するゲイン算出部とをさらに有し、
前記Lチャネルフィルタ係数計算手段は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Lチャネルミキシング係数からLチャネルフィルタ係数を算出し、
前記Rチャネルフィルタ係数計算手段は、さらに前記各話者に対するゲインが乗算され、受話信号成分が抑圧され、雑音成分が抑圧される条件で、前記記憶された共分散行列と前記Rチャネルミキシング係数からRチャネルフィルタ係数を算出する、
請求項9から12のいずれかに記載の収音装置。
Speaker voice level estimating means for estimating the voice level of each speaker from the stored covariance matrix of each speaker;
A gain calculating unit for calculating a gain for each speaker for outputting each speaker's voice at an appropriate level from the voice level of each speaker;
The L channel filter coefficient calculation means further multiplies the gain for each speaker, suppresses the received signal component, and suppresses the noise component, from the stored covariance matrix and the L channel mixing coefficient. L channel filter coefficient is calculated,
The R channel filter coefficient calculation means further multiplies the gain for each speaker, suppresses the received signal component, and suppresses the noise component, from the stored covariance matrix and the R channel mixing coefficient. Calculating R channel filter coefficients;
The sound collection device according to claim 9.
前記記憶された共分散行列のうち対角成分で最もパワーの大きい成分、または前記記憶された共分散行列の対角成分の加算値の周波数特性を平滑化するゲインを、前記記憶された共分散行列に乗算し、白色化された共分散行列を前記Lチャネルフィルタ係数計算手段と前記Rチャネルフィルタ係数計算手段に入力する白色化手段をさらに有する、
請求項9から13のいずれかに記載の収音装置。
The stored covariance is a gain for smoothing the frequency characteristic of the diagonal component of the stored covariance matrix having the highest power or the added value of the diagonal components of the stored covariance matrix. A whitening unit that multiplies the matrix and inputs the whitened covariance matrix to the L channel filter coefficient calculation unit and the R channel filter coefficient calculation unit;
The sound collecting device according to claim 9.
前記複数の収音手段の各々で受音された信号および前記受話信号を時間領域信号から周波数領域信号に変換するFFT手段と、
前記Lチャネル加算手段と前記Rチャネル加算手段の出力信号を周波数領域信号から時間領域信号に変換するIFFT手段とをさらに有し、
前記各手段は周波数領域で演算する、
請求項9から14のいずれかに記載の収音装置。
FFT means for converting a signal received by each of the plurality of sound pickup means and the received signal from a time domain signal to a frequency domain signal;
IFFT means for converting the output signal of the L channel addition means and the R channel addition means from a frequency domain signal to a time domain signal;
Each means operates in the frequency domain,
The sound collection device according to claim 9.
前記LおよびRチャネルフィルタ係数計算手段と前記LおよびRチャネルフィルタ手段と前記LおよびRチャネル加算手段を、3チャネル以上の1〜Jチャネルフィルタ係数計算手段と1〜Jチャネルフィルタ手段と1〜Jチャネル加算手段に置き換えた、
請求項9から15のいずれかに記載の収音装置。
The L and R channel filter coefficient calculation means, the L and R channel filter means, and the L and R channel addition means are divided into three or more channels, 1 to J channel filter coefficient calculation means, 1 to J channel filter means, and 1 to J. Replaced with channel addition means,
The sound collecting device according to claim 9.
請求項1から8のいずれかに記載の収音方法をコンピュータに実行させるための収音プログラム。   A sound collection program for causing a computer to execute the sound collection method according to claim 1. 請求項17に記載の収音プログラムを記載した記録媒体。   A recording medium on which the sound collecting program according to claim 17 is described.
JP2003370697A 2003-10-30 2003-10-30 Sound collection method, apparatus, program, and recording medium Expired - Fee Related JP4298466B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003370697A JP4298466B2 (en) 2003-10-30 2003-10-30 Sound collection method, apparatus, program, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003370697A JP4298466B2 (en) 2003-10-30 2003-10-30 Sound collection method, apparatus, program, and recording medium

Publications (2)

Publication Number Publication Date
JP2005136709A JP2005136709A (en) 2005-05-26
JP4298466B2 true JP4298466B2 (en) 2009-07-22

Family

ID=34647631

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003370697A Expired - Fee Related JP4298466B2 (en) 2003-10-30 2003-10-30 Sound collection method, apparatus, program, and recording medium

Country Status (1)

Country Link
JP (1) JP4298466B2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2426168B (en) * 2005-05-09 2008-08-27 Sony Comp Entertainment Europe Audio processing
JP2009116245A (en) * 2007-11-09 2009-05-28 Yamaha Corp Speech enhancement device
JP5022459B2 (en) * 2010-03-03 2012-09-12 日本電信電話株式会社 Sound collection device, sound collection method, and sound collection program
EP2560161A1 (en) 2011-08-17 2013-02-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Optimal mixing matrices and usage of decorrelators in spatial audio processing
KR102112018B1 (en) * 2013-11-08 2020-05-18 한국전자통신연구원 Apparatus and method for cancelling acoustic echo in teleconference system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07250397A (en) * 1994-03-09 1995-09-26 Nippon Telegr & Teleph Corp <Ntt> Echo cancellation method and equipment embodying this method
JPH1042396A (en) * 1996-07-23 1998-02-13 Sanyo Electric Co Ltd Acoustic image controller
JPH10257598A (en) * 1997-03-14 1998-09-25 Nippon Telegr & Teleph Corp <Ntt> Sound signal synthesizer for localizing virtual sound image
JP3541339B2 (en) * 1997-06-26 2004-07-07 富士通株式会社 Microphone array device
JPH11304906A (en) * 1998-04-20 1999-11-05 Nippon Telegr & Teleph Corp <Ntt> Sound-source estimation device and its recording medium with recorded program
JP3878892B2 (en) * 2002-08-21 2007-02-07 日本電信電話株式会社 Sound collection method, sound collection device, and sound collection program
US7716044B2 (en) * 2003-02-07 2010-05-11 Nippon Telegraph And Telephone Corporation Sound collecting method and sound collecting device
JP4119328B2 (en) * 2003-08-15 2008-07-16 日本電信電話株式会社 Sound collection method, apparatus thereof, program thereof, and recording medium thereof.

Also Published As

Publication number Publication date
JP2005136709A (en) 2005-05-26

Similar Documents

Publication Publication Date Title
US9922663B2 (en) Voice signal processing method and apparatus
JP5654513B2 (en) Sound identification method and apparatus
US9210504B2 (en) Processing audio signals
EP2749016B1 (en) Processing audio signals
JP4286637B2 (en) Microphone device and playback device
US9232309B2 (en) Microphone array processing system
KR101934999B1 (en) Apparatus for removing noise and method for performing thereof
JP7352291B2 (en) sound equipment
JP4249729B2 (en) Automatic gain control method, automatic gain control device, automatic gain control program, and recording medium recording the same
WO2004071130A1 (en) Sound collecting method and sound collecting device
JP5611970B2 (en) Converter and method for converting audio signals
CN1902901A (en) System and method for enhanced subjective stereo audio
JP2001309483A (en) Sound pickup method and sound pickup device
CN105284133A (en) Apparatus and method for center signal scaling and stereophonic enhancement based on a signal-to-downmix ratio
JP5034607B2 (en) Acoustic echo canceller system
JP5762479B2 (en) Voice switch device, voice switch method, and program thereof
JP4298466B2 (en) Sound collection method, apparatus, program, and recording medium
JP4116600B2 (en) Sound collection method, sound collection device, sound collection program, and recording medium recording the same
US20130253923A1 (en) Multichannel enhancement system for preserving spatial cues
JP2005064968A (en) Method, device and program for collecting sound, and recording medium
JP5267808B2 (en) Sound output system and sound output method
JP5937451B2 (en) Echo canceling apparatus, echo canceling method and program
JP4080987B2 (en) Echo / noise suppression method and multi-channel loudspeaker communication system
JP2002062900A (en) Sound collecting device and signal receiving device
JP2005062096A (en) Detection method of speaker position, system, program and record medium

Legal Events

Date Code Title Description
RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20050621

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060417

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20090408

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20090415

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120424

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130424

Year of fee payment: 4

LAPS Cancellation because of no payment of annual fees