JP2011172081A

JP2011172081A - Amplifying conversation method, device and program

Info

Publication number: JP2011172081A
Application number: JP2010034890A
Authority: JP
Inventors: Sumitaka Sakauchi; 澄宇阪内; Akira Emura; 暁江村; Kenta Niwa; 健太丹羽; Yoichi Haneda; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-02-19
Filing date: 2010-02-19
Publication date: 2011-09-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide an amplifying conversation device, method and program, in which a sound volume can be automatically controlled for each microphone not degrading follow-up performance, without increasing a calculation amount of an adapted filter or a memory amount. <P>SOLUTION: The amplifying conversation device includes a main channel estimating section 430 for estimating one or more microphones to be a main channel, an addition section 440 for adding a collected sound as a sound signal, an echo cancel section 450 for canceling echo of the added sound signal, a sound detecting section 461 for detecting sound from the echo-canceled sound signal, a superposition gain calculating section 462 for setting an initial value based on the main channel when sound is detected and calculating a superposition gain using the initial value and the echo-canceled sound signal, a gain superposing section 463 for superposing the superposition gain and the echo-canceled sound signal, and a superposition gain storing section 464 for respective channels for storing the superposition gain corresponding to the main channel. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、音信号を用いた遠隔通信において、スピーカから出力される音声レベルを、マイクロホンに入力された音声レベルなどに応じて、自動で制御する拡声通話方法、拡声通話装置、拡声通話プログラムに関する。 The present invention relates to a voice call method, a voice call device, and a voice call program for automatically controlling a voice level output from a speaker in accordance with a voice level inputted to a microphone in remote communication using a sound signal. .

遠隔通信会議に用いられる拡声通話装置の機能として、図１のように適応フィルタを用いて音響エコーの発生を防止するエコーキャンセラがある。エコーキャンセラには、図２のようにボイススイッチを併用することにより、適応フィルタに対する初期学習を必要としないものもある。
例えば、図１に示すエコーキャンセラでは、適応フィルタ９１４が、スピーカ９１１からマイクロホン９１２への音響エコー経路９１３に相当する伝達経路を推定する。適応フィルタ９１４は、推定された伝達経路に基づき、疑似エコー信号を合成し、エコー信号から、疑似エコー信号を差し引くことで、エコー信号を消去する。
例えば、図２に示すエコーキャンセラでは、ＢＧ適応フィルタ９２４が、適応処理を行い、エコー消去フィルタ９２５が、ＢＧ適応フィルタ９２４から転送される係数で疑似エコーを推定する。ボイススイッチ制御回路９２６は、電源立ち上げ時に主に損失を挿入し、適応フィルタの収束に伴い、挿入する損失量を減少させる（非特許文献１参照）。 As a function of the voice communication device used in the remote communication conference, there is an echo canceller that prevents the generation of acoustic echo using an adaptive filter as shown in FIG. Some echo cancellers do not require initial learning for an adaptive filter by using a voice switch as shown in FIG.
For example, in the echo canceller shown in FIG. 1, the adaptive filter 914 estimates a transmission path corresponding to the acoustic echo path 913 from the speaker 911 to the microphone 912. The adaptive filter 914 synthesizes the pseudo echo signal based on the estimated transmission path, and deletes the echo signal by subtracting the pseudo echo signal from the echo signal.
For example, in the echo canceller shown in FIG. 2, the BG adaptive filter 924 performs an adaptive process, and the echo cancellation filter 925 estimates a pseudo echo with a coefficient transferred from the BG adaptive filter 924. The voice switch control circuit 926 mainly inserts a loss when the power is turned on, and reduces the amount of loss to be inserted as the adaptive filter converges (see Non-Patent Document 1).

また、遠隔通信会議に用いられる拡声通話装置の機能として、図３のような自動音量制御装置がある。
図３の自動音量制御装置では、まず、入力信号は切り出し窓により切り出され、バッファ９３１に蓄えられる。次に、音声信号識別回路９３２が、入力信号が音声信号であるか否かを識別する。音量増幅回路９３３は、音声信号識別回路９３２の識別結果に基づいて、入力信号の増幅率を決定する。波形重畳回路９３４は、音量増幅回路９３３にて決定された増幅率に基づいたコサイン窓を入力信号に重畳し、重畳結果を出力信号とする。音声信号識別回路９３２に用いられるアルゴリズムは隠れマルコフモデル、ベクトル量子化、ニューラルネットワークなどである（特許文献１参照）。 Moreover, there is an automatic volume control device as shown in FIG.
In the automatic volume control apparatus of FIG. 3, first, the input signal is cut out by the cutout window and stored in the buffer 931. Next, the audio signal identification circuit 932 identifies whether or not the input signal is an audio signal. The volume amplification circuit 933 determines the amplification factor of the input signal based on the identification result of the audio signal identification circuit 932. The waveform superimposing circuit 934 superimposes a cosine window based on the amplification factor determined by the volume amplifier circuit 933 on the input signal, and uses the superimposition result as an output signal. The algorithm used for the audio signal identification circuit 932 is a hidden Markov model, vector quantization, a neural network, or the like (see Patent Document 1).

特開平８−２５０９４４号公報JP-A-8-250944

北脇信彦編著、「未来ねっと技術シリーズディジタル音声・オーディオ技術」オーム社出版、pp218〜255.Edited by Nobuhiko Kitawaki, `` Future Netto Technology Series Digital Voice / Audio Technology '', published by Ohmsha, pp218-255.

例えば、遠隔通信会議などで、複数の話者が存在し、話者それぞれにマイクロホンが設置された場合、話者と話者に近接するマイクロホンとの距離、話者の声の大きさが、それぞれ異なっていることにより、マイクロホンごとに出力音声レベルにバラツキが生じ、出力音声が聞き取りにくくなるという課題がある。 For example, in a telecommunication conference, when there are multiple speakers and a microphone is installed for each speaker, the distance between the speaker and the microphone adjacent to the speaker, and the speaker's voice volume, Due to the difference, there is a problem that the output sound level varies from microphone to microphone, making it difficult to hear the output sound.

この課題を解決するためには、複数のマイクロホンの各々に対して、個別に自動音量制御を行うことが必要である。
しかしながら、適応フィルタは音響エコー経路を線形システムとして推定して動作する一方、前述のような自動音量制御は非線形な処理を行うため、当該制御を適応フィルタと音響エコー経路の間に配置できない。
したがって、自動音量制御は適応フィルタよりも伝送網側にて行う必要がある。このため、複数のマイクロホンの各々に対して個別に自動音量制御を行う場合には、適応フィルタにマイクロホンと同じ数だけ音信号を入力しなければならず、適応フィルタの演算量やメモリ量が、マイクロホンの数に比例して増大する、という課題が生じる。 In order to solve this problem, it is necessary to perform automatic volume control individually for each of the plurality of microphones.
However, while the adaptive filter operates by estimating the acoustic echo path as a linear system, the automatic volume control as described above performs non-linear processing, and thus the control cannot be arranged between the adaptive filter and the acoustic echo path.
Therefore, automatic volume control needs to be performed on the transmission network side rather than the adaptive filter. For this reason, when performing automatic volume control individually for each of a plurality of microphones, the same number of sound signals as the microphones must be input to the adaptive filter, and the amount of computation and memory of the adaptive filter are The problem of increasing in proportion to the number of microphones arises.

このあらたな課題を解決するためには、複数のマイクロホンからの入力音信号を、１つに加算してから適応フィルタに入力し、適応フィルタの演算量、メモリ量を減少させる必要がある。
しかしながら、前述のような自動音量制御は制御に時間を要するため、話者が変わることにより、メインで収音するマイクロホン（メインチャネル、以下同じ）が頻繁に移り変わる場合には、追随性能が劣化する、という課題がさらに生じる。 In order to solve this new problem, it is necessary to add input sound signals from a plurality of microphones to one and then input to the adaptive filter, thereby reducing the amount of computation and memory of the adaptive filter.
However, since the automatic volume control as described above takes time to control, the follow-up performance deteriorates when the microphone (main channel, the same applies hereinafter) that collects sound frequently changes due to the change of the speaker. A further problem arises.

本発明は、これらの課題を解決するためになされたもので、適応フィルタの演算量やメモリ量を増やすことなく、メインチャネルの頻繁な移り変わりによっても、追随性能が劣化しない、複数のマイクロホンの各々に対する個別の自動音量制御を可能とする拡声通話方法、拡声通話装置、拡声通話プログラムを提供することを目的とする。 The present invention has been made to solve these problems. Each of the plurality of microphones does not deteriorate the following performance even if the main channel is frequently changed without increasing the calculation amount and memory amount of the adaptive filter. It is an object to provide a loudspeaking method, a loudspeaker device, and a loudspeaker program that enable individual automatic volume control.

本発明の拡声通話方法は、メインチャネル推定処理と、加算処理と、エコーキャンセル処理と、音声検出処理と、重畳ゲイン計算処理と、ゲイン重畳処理と、チャネル別重畳ゲイン記憶処理とを有する。
メインチャネル推定処理では、Ｎ個（ただしＮは２以上の整数）のマイクロホンの１以上をメインチャネルとして推定する。加算処理では、マイクロホンに収音された音を、音信号として加算する。エコーキャンセル処理では、加算された音信号のエコーをキャンセルする。音声検出処理では、エコーキャンセル処理を行った音信号から、音声を検出する。重畳ゲイン計算処理では、音声が検出された場合に、推定されたメインチャネルに基づいて初期値を設定し、この初期値と、エコーキャンセル処理を行った音信号とを用いて重畳ゲインを計算する。ゲイン重畳処理では、重畳ゲイン計算処理により計算された重畳ゲインと、エコーキャンセル処理を行った音信号とを重畳する。チャネル別重畳ゲイン記憶処理では、重畳ゲイン計算処理により計算された重畳ゲインを、推定されたメインチャネルに対応させて、チャネル別重畳ゲイン記憶部に記憶する。
ここで、重畳ゲイン計算処理で設定する初期値は、初期値設定処理時に、チャネル別重畳ゲイン記憶部にメインチャネルごとに記憶されている過去の重畳ゲインを用いることもできる。
また、各群が複数のマイクロホンを有するように、マイクロホンをＭ個（ただしＭは２以上の整数）の群に分け、各群のマイクロホンで収音された音に対して、群ごとに前述の処理を行うこともできる。 The voice call method of the present invention includes main channel estimation processing, addition processing, echo cancellation processing, voice detection processing, superposition gain calculation processing, gain superposition processing, and channel-specific superposition gain storage processing.
In the main channel estimation process, one or more of N (where N is an integer of 2 or more) microphones are estimated as the main channel. In the addition process, the sound collected by the microphone is added as a sound signal. In the echo cancellation process, the echo of the added sound signal is canceled. In the sound detection process, a sound is detected from the sound signal subjected to the echo cancellation process. In the superposition gain calculation process, when speech is detected, an initial value is set based on the estimated main channel, and a superposition gain is calculated using the initial value and the sound signal subjected to the echo cancellation process. . In the gain superimposing process, the superimposing gain calculated by the superimposing gain calculating process and the sound signal subjected to the echo canceling process are superimposed. In the channel-specific superimposing gain storage process, the superimposing gain calculated by the superimposing gain calculation process is stored in the channel-specific superimposing gain storage unit in association with the estimated main channel.
Here, as the initial value set in the superposition gain calculation process, a past superposition gain stored for each main channel in the superposition gain storage unit for each channel can also be used during the initial value setting process.
Further, the microphones are divided into M groups (where M is an integer of 2 or more) so that each group has a plurality of microphones, and the sound collected by the microphones of each group is described for each group. Processing can also be performed.

本発明によれば、複数のマイクロホンからの入力音信号を加算処理で加算し、加算された音信号にたいして、エコーキャンセル処理を行うので、適応フィルタの演算量、メモリ量を増大させることがない。さらに、Ｎ個のマイクロホンの１以上をメインチャネルとして推定し、推定されたメインチャネルに基づいて設定される初期値と、エコーキャンセルを行った音信号とを用いて重畳ゲインを計算し、重畳するので、メインチャネルの頻繁な移り変わりによっても、追随性能を劣化させることがない。従って、本発明の拡声通話方法は、適応フィルタの演算量やメモリ量を増やすことなく、メインチャネルの頻繁な移り変わりによっても追随性能が劣化しない、複数のマイクロホン各々に対する個別の自動音量制御を可能とする。 According to the present invention, input sound signals from a plurality of microphones are added by addition processing, and echo cancellation processing is performed on the added sound signals, so that the calculation amount and memory amount of the adaptive filter are not increased. Furthermore, one or more of the N microphones are estimated as main channels, and a superposition gain is calculated using the initial value set based on the estimated main channel and the sound signal subjected to echo cancellation, and superimposition is performed. Therefore, even if the main channel is frequently changed, the following performance is not deteriorated. Therefore, the loudspeaking method of the present invention enables individual automatic volume control for each of a plurality of microphones without increasing the computation amount of the adaptive filter and the amount of memory, and the follow-up performance does not deteriorate due to frequent changes in the main channel. To do.

従来例を説明する図。The figure explaining a prior art example. 従来例を説明する図。The figure explaining a prior art example. 従来例を説明する図。The figure explaining a prior art example. 実施例１、変形例１に係る拡声通話装置の構成を示すブロック図。The block diagram which shows the structure of the loudspeaker apparatus which concerns on Example 1 and the modification 1. FIG. 実施例１に係る拡声通話装置の動作を示すフローチャート。3 is a flowchart showing the operation of the loudspeaker device according to the first embodiment. 変形例１に係る拡声通話装置の動作を示すフローチャート。9 is a flowchart showing the operation of a loudspeaker device according to Modification 1. 実施例２に係る拡声通話装置の構成を示すブロック図。FIG. 4 is a block diagram illustrating a configuration of a loudspeaker device according to a second embodiment. 実施例２に係る拡声通話装置の動作を示すフローチャート。9 is a flowchart showing the operation of the loudspeaker device according to the second embodiment.

以下、本発明の実施の形態について、詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail.

図４、５を参照して、本発明の実施例１に係る拡声通話装置および、拡声通話方法を説明する。図４は、実施例１に係る拡声通話装置４００の構成を示すブロック図である。図５は、実施例１に係る拡声通話装置４００の動作を示すフローチャートである。拡声通話装置４００は、スピーカ９１１と、Ｎ個（ただしＮは２以上の整数、以下同じ）のマイクロホン９１２−１〜Ｎと、メインチャネル推定部４３０と、加算部４４０と、エコーキャンセル部４５０と、自動音量調整部４６０とを備えている。自動音量調整部４６０は、音声検出部４６１と、重畳ゲイン計算部４６２と、ゲイン重畳部４６３と、チャネル別重畳ゲイン記憶部４６４とを備えている。 With reference to FIGS. 4 and 5, the loudspeaker device and the loudspeaker method according to Embodiment 1 of the present invention will be described. FIG. 4 is a block diagram illustrating the configuration of the loudspeaker apparatus 400 according to the first embodiment. FIG. 5 is a flowchart illustrating the operation of the loudspeaker apparatus 400 according to the first embodiment. The loudspeaker apparatus 400 includes a speaker 911, N microphones (where N is an integer of 2 or more, and the same applies hereinafter), a main channel estimation unit 430, an addition unit 440, and an echo cancellation unit 450. And an automatic volume controller 460. The automatic volume control unit 460 includes a voice detection unit 461, a superimposition gain calculation unit 462, a gain superposition unit 463, and a channel-specific superimposition gain storage unit 464.

ネットワーク４７０から拡声通話装置４００に入力された音信号は、エコーキャンセル部４５０を経由して、スピーカ９１１から拡声される。マイクロホン９１２−１〜Ｎは、収音した音を音信号に変換し、加算部４４０と、メインチャネル推定部４３０それぞれに入力する。
メインチャネル推定部４３０は、Ｎ個のマイクロホン９１２−１〜Ｎの１以上をメインチャネルとして推定する（Ｓ５３０）。メインチャネル推定方法として、マイクロホン９１２−１〜Ｎから出力される音信号の時間信号パワーを比較して、当該パワーが最大となるマイクロホンをメインチャネルとする方法がある。ただし、当該推定方法に限定する必要はなく、話者の入れ替わりに対して、追随性良く推定できる方法であれば、他の方法でもかまわない。推定されるメインチャネルは１以上Ｎ未満であれば、いくつとしても良く、時間信号パワーを比較して、メインチャネルを推定する場合には、時間信号パワーの大きいものから順に、予め定めておいた数のマイクロホンをメインチャネルとしても良いし、予め定めておいた閾値を超える時間信号パワーが出力されたマイクロホン全てをメインチャネルとしても良い。この場合にも前記と同様、話者の入れ替わりに対して、追随性良く推定できる方法であれば、他の方法でもかまわない。
加算部４４０は、マイクロホンに収音された音を、音信号として加算する（Ｓ５４０）。エコーキャンセル部４５０は、加算された音信号のエコーをキャンセルする（Ｓ５５０）。エコーキャンセル部４５０は、図１の一般的な形式のエコーキャンセラとしても良いし、図２のようにボイススイッチを併用する形式のものでも良い。 A sound signal input from the network 470 to the loudspeaker apparatus 400 is loudened from the speaker 911 via the echo canceling unit 450. Microphones 912-1 to 912 -N convert the collected sound into sound signals and input the sound signals to the adding unit 440 and the main channel estimating unit 430, respectively.
The main channel estimation unit 430 estimates one or more of the N microphones 912-1 to 912 -N as the main channel (S 530). As a main channel estimation method, there is a method in which the time signal powers of sound signals output from the microphones 912-1 to 912-N are compared, and the microphone having the maximum power is used as the main channel. However, it is not necessary to limit to the estimation method, and any other method may be used as long as it can be estimated with good followability with respect to the change of speakers. As long as the estimated main channel is 1 or more and less than N, any number may be used. When the main channel is estimated by comparing the time signal power, the time channel power is determined in descending order. Several microphones may be used as main channels, or all microphones that output time signal power exceeding a predetermined threshold may be used as main channels. Also in this case, as described above, other methods may be used as long as they can be estimated with good followability with respect to the change of speakers.
The adding unit 440 adds the sound collected by the microphone as a sound signal (S540). The echo cancel unit 450 cancels the echo of the added sound signal (S550). The echo canceling unit 450 may be a general type echo canceller of FIG. 1, or may be of a type using a voice switch as shown in FIG.

音声検出部４６１は、エコーをキャンセルした音信号から、音声が存在するか否かを検出する（Ｓ５６１）。音声検出方法は、閾値を超える音信号レベルが入力された場合に、音声を検出したと判定する方法で良い。検出に用いる閾値は、周囲雑音レベルから予め定める。ただし、当該検出方法に限定する必要はなく、メインチャネル推定部４３０における方法と同様に、話者の入れ替わりに対して、追随性良く検出できる方法であれば他の方法を用いることもできる。 The voice detection unit 461 detects whether or not there is a voice from the sound signal from which the echo is canceled (S561). The sound detection method may be a method of determining that sound is detected when a sound signal level exceeding a threshold is input. The threshold used for detection is determined in advance from the ambient noise level. However, it is not necessary to limit to the detection method, and other methods can be used as long as they can be detected with good follow-up with respect to the change of the speaker, similarly to the method in the main channel estimation unit 430.

重畳ゲイン計算部４６２は、音声が検出された場合に、推定されたメインチャネルに基づいて初期値を設定し（Ｓ５６２ａ）、この初期値と、エコーキャンセル処理を行った音信号とを用いて重畳ゲインを計算する（Ｓ５６２ｂ）。音声が検出されない場合には、出力音声レベルの調整は行われない。重畳ゲインの計算方法について以下に詳細を述べる。時刻ｔにおいて、メインチャネルがＣｎと推定されたとき、計算される重畳ゲインをＧ_Ｃｎ[ｔ]、メインチャネルＣｎからの入力音信号レベルをＶ_Ｃｎ[ｔ]、ネットワーク４７０へ出力する音信号の目標出力音信号レベルをＶ_ｄ、初期値をＧ_ｉ、時定数をαとすると、重畳ゲインＧ_Ｃｎ[ｔ]は、忘却係数を乗じた加算により、
Ｇ_Ｃｎ[ｔ]＝（１−α）×Ｖ_ｄ／Ｖ_Ｃｎ[ｔ]＋α×Ｇ_ｉ
にて、計算される。
上式では、目標出力音信号レベルと、入力音信号レベルとの比Ｖ_ｄ／Ｖ_Ｃｎ[ｔ]を用いることとしているが、他の方法を用いることとしても構わない。 The superposition gain calculation unit 462 sets an initial value based on the estimated main channel when speech is detected (S562a), and superimposes using the initial value and the sound signal subjected to echo cancellation processing. The gain is calculated (S562b). If no sound is detected, the output sound level is not adjusted. Details of the calculation method of the superposition gain will be described below. When the main channel is estimated to be Cn at time t, the calculated superposition gain is G _{Cn [t]} , the input sound signal level from the main channel Cn is V _{Cn [t]} , and the sound signal output to the network 470 is Assuming that the target output sound signal level is V _d , the initial value is G _i , and the time constant is α, the superposition gain G _{Cn [t]} is obtained by adding the forgetting factor,
G _{Cn [t]} = (1−α) × V _d / V _{Cn [t]} + α × G _i
In the calculation.
In the above equation, the ratio V _d / V _{Cn [t]} between the target output sound signal level and the input sound signal level is used, but other methods may be used.

ここで初期値Ｇ_ｉには、ｋを正の整数として、ｋ時刻過去の該当メインチャネルの重畳ゲインＧ_{Ｃｎ[ｔ-k]}を用いることができる。従って、初期値Ｇ_ｉをｋ時刻過去の重畳ゲインとした場合、時刻ｔにおける重畳ゲインは、
Ｇ_Ｃｎ[ｔ]＝（１−α）×Ｖ_ｄ／Ｖ_Ｃｎ[ｔ]＋α×Ｇ_{Ｃｎ[ｔ−ｋ]}
にて、計算される。
また、時刻ｔにおける初期値Ｇ_ｉを計算するために、時刻ｔ−ｋから時刻ｔ−１までの複数の重畳ゲインを用いることとしても良い。この場合、時刻ｔにおいて、メインチャネルがＣｎであるとき、初期値Ｇ_ｉ,重畳ゲインＧ_Ｃｎ[ｔ]は、時定数α_ｔ,α_ｔ−１,α_ｔ−２,…α_ｔ−ｋを用いて、下式により計算される。 Here, as the initial value G _i , the superposition gain G _{Cn [t−k]} of the corresponding main channel in the past k times can be used, where k is a positive integer. Therefore, when the initial value G _i is a superposition gain at k times in the past, the superposition gain at time t is
G _{Cn [t]} = (1−α) × V _d / V _{Cn [t]} + α × G _{Cn [t−k]}
In the calculation.
Further, in order to calculate the initial value G _i at time t, it is also possible to use a plurality of superimposed gain from time t-k to a time t-1. In this case, when the main channel is Cn at time t, the initial value G _i and the superposition gain G _{Cn [t]} are time constants α _t , α _t−1 , α _t−2 _,. And is calculated by the following formula.

Ｇ_Ｃｎ[ｔ]＝α_ｔ×Ｖ_ｄ／Ｖ_Ｃｎ[ｔ]＋Ｇ_ｉ
ただし、 G _{Cn [t]} = α _t × V _d / V _{Cn [t]} + G _i
However,

重畳ゲインの計算方法は、上記の式で表現される方法に限定されず、複数過去の該当メインチャネルの重畳ゲインを用いて、時間による重畳ゲインの変化を平滑化して算出する方法を用いることとしても良い。 The calculation method of the superposition gain is not limited to the method expressed by the above formula, and a method of smoothing and calculating the superposition gain change with time using the superposition gain of the corresponding main channel in the past is used. Also good.

また、メインチャネル推定部４３０において、推定するメインチャネルを複数とする場合、推定されたメインチャネルがＣ１，Ｃ２，Ｃ３，…，Ｃｎのｎ個、各メインチャネルからの入力音信号レベルが、Ｖ_Ｃ１[ｔ]，Ｖ_Ｃ２[ｔ]，Ｖ_Ｃ３[ｔ]，…,Ｖ_Ｃｎ[ｔ]であるとき、時刻ｔにおける重畳ゲインＧ_{Ｃ１,Ｃ２,Ｃ３,…,Ｃｎ[t]}は、 When the main channel estimation unit 430 uses a plurality of main channels to be estimated, the estimated main channels are n of C1, C2, C3,..., Cn, and the input sound signal level from each main channel is V _{When C1 [t]} , V _{C2 [t]} , V _{C3 [t]} ,..., V _{Cn [t]} , the superposition gains G _{C1, C2, C3,.}

にて、計算することができる。ここで、ａ（ｎ）はメインチャネル数ｎに依存する補正係数であり、１より大きな値をとる。 Can be calculated. Here, a (n) is a correction coefficient depending on the number of main channels n, and takes a value larger than 1.

ゲイン重畳部４６３は、重畳ゲイン計算部４６２で計算された重畳ゲインと、エコーキャンセル処理を行った音信号とを重畳する（Ｓ５６３）。チャネル別重畳ゲイン記憶部４６４は、重畳ゲイン計算処理（Ｓ５６２ａ,Ｓ５６２ｂ）により計算された重畳ゲインを、推定されたメインチャネルに対応させて記憶する（Ｓ５６４）。例えば、時刻ｔにて、メインチャネルＣｎであった場合、チャネル別重畳ゲイン記憶部４６４は、チャネルＣｎに対応する記憶領域に、重畳ゲインＧ_Ｃｎ[ｔ]を記憶する。このとき、メインチャネルと推定されなかったチャネルについては、時刻ｔにおいて、チャネル別重畳ゲイン記憶部４６４に記憶されている重畳ゲインを、そのまま保持しておき、時刻ｔ以降の計算時に、随時読み出して用いるものとする。また、初期値Ｇ_ｉを計算するために、時刻ｔ−ｋから時刻ｔ−１までの複数の重畳ゲインを用いる場合には、例えば、チャネルＣｎ，時刻ｔ−ｋに対応する記憶領域には、重畳ゲインＧ_{Ｃｎ[ｔ-ｋ]}を記憶する。従って、記憶する重畳ゲインの数は、チャネルごとにＮ個、時刻ごとにｋ個となり、記憶総数はＮ×ｋ個となる。 The gain superimposing unit 463 superimposes the superimposing gain calculated by the superimposing gain calculating unit 462 and the sound signal subjected to the echo canceling process (S563). The channel-specific superposition gain storage unit 464 stores the superposition gain calculated by the superposition gain calculation processing (S562a, S562b) in association with the estimated main channel (S564). For example, when the channel is the main channel Cn at time t, the channel-specific superposition gain storage unit 464 stores the superposition gain G _{Cn [t]} in the storage area corresponding to the channel Cn. At this time, for the channel not estimated as the main channel, the superposition gain stored in the channel-specific superposition gain storage unit 464 is held as it is at time t, and is read out at any time during calculation after time t. Shall be used. Further, in order to calculate the initial value G _i, in the case of using a plurality of superimposed gain from time t-k to a time t-1, for example, in the storage area corresponding to the channel Cn, time t-k is The superposition gain G _{Cn [t−k]} is stored. Accordingly, the number of superposition gains to be stored is N for each channel, k for each time, and the total number stored is N × k.

また、マイクロホンがＮ個であり、推定するメインチャネルがｎ個（ｎは２以上の整数）である場合には、重畳ゲインは、Ｎ個のマイクロホンのうちからｎ個のメインチャネルを選び出す組み合わせ、_ＮＣ_ｎ通り計算される。従って、チャネル別重畳ゲイン記憶部４６４には、各チャネルの組み合わせに対応する記憶領域に、総数_ＮＣ_ｎ個の重畳ゲインが記憶されることとなる。
加えて、初期値Ｇ_ｉを計算するために、時刻ｔ−ｋから時刻ｔ−１までの複数の重畳ゲインを用いる場合には、記憶される重畳ゲインの総数は_ＮＣ_ｎ×ｋ個となる。
このようにしてチャネル別重畳ゲイン記憶部４６４に記憶された重畳ゲインは、初期値や新たに重畳ゲインを計算するときに、重畳ゲイン計算部４６２に随時読みだされて使用される。拡声通話装置４００は、電源投入時に、チャネル別重畳ゲイン記憶部４６４の各記憶領域に予め値１を設定する。重畳ゲインが計算されるたびに、メインチャネルに対応する記憶領域の重畳ゲインを更新し、もしくは時刻ごとに対応する記憶領域にそれぞれ記憶する。 Further, when there are N microphones and n main channels to be estimated (n is an integer of 2 or more), the superposition gain is a combination of selecting n main channels from the N microphones, _N C _n ways are calculated. Accordingly, the channel-by-channel superposition gain storage unit 464 stores a total of _N C _n superposition gains in a storage area corresponding to each channel combination.
In addition, when a plurality of superposition gains from time t-k to time t-1 are used to calculate the initial value G _i , the total number of superposition gains to be stored is _N C _n × k. .
The superposition gain stored in the channel-specific superposition gain storage unit 464 in this manner is read and used as needed by the superposition gain calculation unit 462 when calculating an initial value or a new superposition gain. Loudspeaker 400 sets a value 1 in advance in each storage area of channel-specific superposition gain storage section 464 when power is turned on. Each time the superposition gain is calculated, the superposition gain in the storage area corresponding to the main channel is updated or stored in the storage area corresponding to each time.

本実施例により、加算部４４０が複数の入力音信号を加算し（Ｓ５４０）、エコーキャンセル部４５０が加算された音信号にたいしてエコーキャンセルを行う（Ｓ５５０）ので、適応フィルタの演算量、メモリ量を増大させることがない。さらに、メインチャネル推定部４３０が、Ｎ個のマイクロホンの１以上をメインチャネルとして推定（Ｓ５３０）し、重畳ゲイン計算部４６２が、推定されたメインチャネルに基づいて初期値を設定（Ｓ５６２ａ）し、当該初期値と、エコーキャンセルを行った音信号とを用いて重畳ゲインを計算し（Ｓ５６２ｂ）、ゲイン重畳部４６３が重畳ゲインを音信号に重畳（Ｓ５６３）するので、メインチャネルの頻繁な移り変わりによっても、追随性能を劣化させることがない。従って、本実施例の拡声通話装置および拡声通話方法は、適応フィルタの演算量やメモリ量を増やすことなく、メインチャネルの頻繁な移り変わりによっても追随性能が劣化しない、複数のマイクロホン各々に対する個別の自動音量制御を可能とする。 According to the present embodiment, the adding unit 440 adds a plurality of input sound signals (S540), and the echo canceling unit 450 performs echo cancellation (S550), so that the calculation amount and memory amount of the adaptive filter are reduced. There is no increase. Further, the main channel estimation unit 430 estimates one or more of the N microphones as main channels (S530), and the superposition gain calculation unit 462 sets an initial value based on the estimated main channel (S562a), The superposition gain is calculated using the initial value and the sound signal subjected to echo cancellation (S562b), and the gain superposition unit 463 superimposes the superposition gain on the sound signal (S563). However, the following performance is not deteriorated. Therefore, the loudspeaker device and the loudspeaker method according to the present embodiment do not increase the amount of calculation of the adaptive filter and the amount of memory, and the automatic performance for each of the plurality of microphones does not deteriorate due to frequent changes of the main channel. Allows volume control.

さらに本実施例では、メインチャネル推定部４３０が、複数のマイクロホンをメインチャネルと推定するので、話者音声が複数のマイクロホンに分かれて入力された場合にも、適切に自動音量制御を行うことができる。さらに本実施例では、重畳ゲイン計算部４６２が、ｋ時刻過去に計算された重畳ゲインを初期値として、現在の重畳ゲインを計算する際に用いるので、過去に計算された重畳ゲインを反映させた自動音量制御を行うことができる。さらに、重畳ゲイン計算部４６２が、当該初期値に忘却係数を乗じるため、過去に計算された重畳ゲインを反映させる程度を任意に設定することができる。 Further, in the present embodiment, the main channel estimation unit 430 estimates a plurality of microphones as the main channel, so that automatic volume control can be appropriately performed even when the speaker voice is divided and input to the plurality of microphones. it can. Furthermore, in the present embodiment, the superposition gain calculation unit 462 uses the superposition gain calculated in the past k times as an initial value when calculating the current superposition gain, so the superposition gain calculated in the past is reflected. Automatic volume control can be performed. Furthermore, since the superposition gain calculation unit 462 multiplies the initial value by a forgetting factor, it is possible to arbitrarily set the degree of reflecting the superposition gain calculated in the past.

［変形例１］
図４、６を参照して、実施例１の変形例１を説明する。
変形例１では、重量ゲイン計算部４６２’は、初期値計算時にチャネル別重畳ゲイン記憶部４６４に記憶されている重畳ゲインを、初期値と設定することを特徴とする（Ｓ６６２ａ）。具体的には、初期値Ｇ_ｉを、時刻ｔ−１においてチャネル別重畳ゲイン記憶部４６４に記憶されている重畳ゲインＧ_{Ｃｎ[ｔ−１]}とする。詳細には、時刻ｔ、メインチャネルＣｎであるときの、重畳ゲインＧ_Ｃｎ[ｔ]は、時定数α、目標出力音信号レベルＶ_ｄを用いて、以下の式にて計算される。
Ｇ_Ｃｎ[ｔ]＝（１−α）×Ｖ_ｄ／Ｖ_Ｃｎ[ｔ]＋α×Ｇ_{Ｃｎ[ｔ−１]}
計算された重畳ゲインＧ_Ｃｎ[ｔ]は、チャネル別重畳ゲイン記憶部４６４に、メインチャネルに対応して記憶され、時刻ｔ以降に再びメインチャネルがｎと推定された場合に、初期値Ｇ_ｉとして、あらたな重畳ゲイン計算に用いられる。
本変形例はこのような構成であるため、実施例１と同様の効果が得られる。 [Modification 1]
A first modification of the first embodiment will be described with reference to FIGS.
In the first modification, the weight gain calculation unit 462 ′ sets the superposition gain stored in the channel-specific superposition gain storage unit 464 at the time of initial value calculation as an initial value (S662a). Specifically, the initial value G _i is set to the superposition gain G _{Cn [t−1]} stored in the channel-specific superposition gain storage unit 464 at time t−1. Specifically, the superposition gain G _{Cn [t]} at the time t and the main channel Cn is calculated by the following equation using the time constant α and the target output sound signal level V _d .
G _{Cn [t]} = (1−α) × V _d / V _{Cn [t]} + α × G _{Cn [t−1]}
The calculated superposition gain G _{Cn [t]} is stored in the channel-specific superposition gain storage unit 464 corresponding to the main channel, and when the main channel is estimated to be n again after time t, the initial value G _i Is used for a new superposition gain calculation.
Since the present modification has such a configuration, the same effects as those of the first embodiment can be obtained.

図７、８を参照して、本発明の実施例２に係る拡声通話装置および、拡声通話方法を説明する。図７は、実施例２に係る拡声通話装置７００の構成を示すブロック図である。図８は、実施例２に係る拡声通話装置７００の動作を示すフローチャートである。実施例２では、各群が複数のマイクロホンを有するように、マイクロホンをＭ個（ただしＭは２以上の整数、以下同じ）の群に分け、各群のマイクロホンで収音された音に対して、群ごとに実施例１記載の処理を行うこととする。
拡声通話装置７００は、スピーカ９１１と、ｍ番目のマイクロホン群において、Ｎ_ｍ個（ただしｍは正の整数、Ｎ_ｍは２以上の整数、以下同じ）のマイクロホン９１２−ｍ−１〜Ｎ_ｍをもつようなＭ個のマイクロホンの群と、マイクロホン群ごとのメインチャネル推定部４３０−１〜Ｍと、マイクロホン群ごとの加算部４４０−１〜Ｍと、エコーキャンセル部７５０と、マイクロホン群ごとの自動音量調整部４６０−１〜Ｍとを備えている。
拡声通話装置７００の自動音量調整部４６０−１〜Ｍは、マイクロホン群ごとに、音声検出部４６１−１〜Ｍと、重畳ゲイン計算部４６２−１〜Ｍと、ゲイン重畳部４６３−１〜Ｍと、チャネル別重畳ゲイン記憶部４６４−１〜Ｍとを備えている。 With reference to FIGS. 7 and 8, a loudspeaker device and a loudspeaker method according to Embodiment 2 of the present invention will be described. FIG. 7 is a block diagram illustrating the configuration of the loudspeaker apparatus 700 according to the second embodiment. FIG. 8 is a flowchart illustrating the operation of the loudspeaker apparatus 700 according to the second embodiment. In the second embodiment, the microphones are divided into M groups (where M is an integer of 2 or more, and the same shall apply hereinafter) so that each group has a plurality of microphones. The processing described in Example 1 is performed for each group.
The loudspeaker apparatus 700 uses N _m (where m is a positive integer, N _m is an integer greater than or equal to 2 and the same applies below) microphones 912-m-1 to N _m in the speaker 911 and the m th microphone group. A group of M microphones, a main channel estimation unit 430-1 to M for each microphone group, an addition unit 440-1 to M for each microphone group, an echo cancellation unit 750, and an automatic for each microphone group Volume adjustment units 460-1 to 460-M are provided.
The automatic volume control units 460-1 to 460 -M of the loudspeaker communication device 700 are, for each microphone group, voice detection units 461-1 to M, superimposition gain calculation units 462-1 to M, and gain superposition units 463-1 to M. And channel-specific superposition gain storage units 464-1 to 464-1M.

ネットワーク４７０から拡声通話装置７００に入力された音信号は、エコーキャンセル部７５０を経由して、スピーカ９１１から拡声される。マイクロホン９１２−１−１〜９１２−Ｍ−Ｎ_Ｍは、収音した音を音信号に変換し、加算部４４０−１〜Ｍと、メインチャネル推定部４３０−１〜Ｍそれぞれに入力する。
メインチャネル推定部４３０−１〜Ｍは、各群のマイクロホンの１以上をメインチャネルとして推定する（Ｓ８３０−１〜Ｍ）。加算部４４０−１〜Ｍは、マイクロホンに収音された音を、群ごとに音信号として加算する（Ｓ５４０−１〜Ｍ）。エコーキャンセル部７５０は、群ごとに加算された音信号のエコーをキャンセルする（Ｓ５５０−１〜Ｍ）。音声検出部４６１−１〜Ｍは、群ごとにエコーをキャンセルした音信号から、音声が存在するか否かを検出する（Ｓ５６１−１〜Ｍ）。重畳ゲイン計算部４６２−１〜Ｍは、音声が検出された場合に、推定されたメインチャネルに基づいて初期値を設定し（Ｓ５６２ａ−１〜Ｍ）、この初期値と、エコーキャンセル処理を行った音信号とを用いて重畳ゲインを計算する（Ｓ５６２ｂ−１〜Ｍ）。ゲイン重畳部４６３−１〜Ｍは、重畳ゲイン計算部４６２−１〜Ｍで計算された重畳ゲインと、群ごとにエコーキャンセル処理を行った音信号とを重畳する（Ｓ５６３−１〜Ｍ）。チャネル別重畳ゲイン記憶部４６４−１〜Ｍは、重畳ゲイン計算処理（Ｓ５６２ａ−１〜Ｍ,Ｓ５６２ｂ−１〜Ｍ）により計算された重畳ゲインを、推定されたメインチャネルに対応させて記憶する（Ｓ５６４−１〜Ｍ）。 A sound signal input from the network 470 to the loudspeaker apparatus 700 is loudened from the speaker 911 via the echo canceling unit 750. The microphones 912-1-1 to 912 -M-N _M convert the collected sound into sound signals and input the sound signals to the addition units 440-1 to 440 -M and the main channel estimation units 430-1 to 430 -M, respectively.
The main channel estimation units 430-1 to 430-1 to M estimate one or more microphones of each group as main channels (S830-1 to M). The adders 440-1 to M add the sounds collected by the microphones as sound signals for each group (S540-1 to M). The echo cancel unit 750 cancels the echo of the sound signal added for each group (S550-1 to M). The sound detection units 461-1 to M detect whether or not there is sound from the sound signal whose echo is canceled for each group (S561-1 to M). Superimposition gain calculators 462-1 to 462-1 set initial values based on the estimated main channel when speech is detected (S562a-1 to M562), and performs echo cancellation processing with these initial values. The superposition gain is calculated using the obtained sound signal (S562b-1 to M). The gain superimposing units 463-1 to 463 -M superimpose the superimposing gain calculated by the superimposing gain calculating units 462-1 to M and the sound signal subjected to the echo cancellation processing for each group (S <b> 563-1 to M). The channel-specific superposition gain storage units 464-1 to 464-M store the superposition gains calculated by the superposition gain calculation processing (S562a-1 to M, S562b-1 to M) in association with the estimated main channel ( S564-1 to M).

本実施例はこのような構成であるため、実施例１と同様の効果が得られる。さらに本実施例では、Ｍ個の群ごとに処理を行う構成であるため、より多くのマイクロホンからの入力音に対し、追随性を劣化させることなく、複数のマイクロホン各々に対する個別の自動音量制御を行うことができる。さらにマイクロホン群ごとに音信号が出力されるため、マイクロホン群ごとに対応するスピーカからの出力が可能となり、出力側のスピーカをＭ個まで拡張することができる。 Since the present embodiment has such a configuration, the same effects as those of the first embodiment can be obtained. Furthermore, in the present embodiment, since the processing is performed for each of the M groups, individual automatic volume control for each of the plurality of microphones is performed without degrading the followability with respect to input sounds from more microphones. It can be carried out. Furthermore, since a sound signal is output for each microphone group, output from the speaker corresponding to each microphone group is possible, and the number of speakers on the output side can be expanded.

Claims

A main channel estimation process for estimating one or more of N (where N is an integer of 2 or more) microphones as a main channel;
An addition process of adding the sound collected by the microphone as a sound signal;
Echo cancellation processing for canceling echo of the sound signal added in the addition processing;
A sound detection process for detecting a sound from the sound signal subjected to the echo cancellation process;
When the voice is detected by the voice detection process, an initial value is set based on the estimated main channel, and a superposition gain is set using the initial value and the sound signal subjected to the echo cancellation process. Superimposing gain calculation processing for calculating
Gain superimposition processing for superimposing the superposition gain calculated by the superimposition gain calculation processing and the sound signal subjected to the echo cancellation processing;
Channel-by-channel superposition gain storage processing for storing the superposition gain calculated by the superposition gain calculation processing in a channel-specific superposition gain storage unit in association with the estimated main channel;
Having
A voice call method characterized by the above.

A voice call method according to claim 1, wherein
The initial value of the superposition gain calculation process is the superposition gain corresponding to the estimated main channel stored in the channel-specific superposition gain storage unit during the initial value setting process;
A voice call method characterized by the above.

3. The process according to claim 1, wherein the microphones are divided into M groups (M is an integer of 2 or more) so that each group has a plurality of microphones, and the sound collected by the microphones of each group is processed. To do the
A voice call method characterized by the above.

A main channel estimation unit that estimates one or more of N (where N is an integer of 2 or more) microphones as a main channel;
An adder for adding the sound collected by the microphone as a sound signal;
An echo cancellation unit for canceling echo of the sound signal added in the addition process;
A sound detection unit that detects sound from the sound signal subjected to echo cancellation by the echo cancellation unit;
When the voice is detected by the voice detection unit, an initial value is set based on the estimated main channel, and a superposition gain is set using the initial value and the sound signal subjected to the echo cancellation processing. A superposition gain calculation unit for calculating
A gain superimposing unit that superimposes the superimposing gain calculated by the superimposing gain calculating process and the sound signal subjected to the echo canceling process;
A channel-specific superimposition gain storage unit that stores the superposition gain calculated by the superimposition gain calculation unit in a channel-specific superimposition gain storage unit in association with the estimated main channel;
Having
A voice communication device characterized by the above.

A loudspeaker program for giving a command to the computer to execute the loudspeaker method according to any one of claims 1 to 3.