JP4850191B2

JP4850191B2 - Automatic volume control device and voice communication device using the same

Info

Publication number: JP4850191B2
Application number: JP2008006823A
Authority: JP
Inventors: 正清田中; 猛大谷
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-01-16
Filing date: 2008-01-16
Publication date: 2012-01-11
Anticipated expiration: 2028-01-16
Also published as: JP2009171208A

Description

本発明は、入力音声信号の音量制御を行う自動音量制御装置及びそれを用いた音声通信装置に関する。 The present invention relates to an automatic volume control device that performs volume control of an input audio signal and a voice communication device using the same.

近年、ＦＴＴＨ（ＦｉｂｅｒＴｏＴｈｅＨｏｍｅ）、公衆無線網、高速移動体通信網といったネットワークの整備により、様々な場所で音声通信機器を用いた音声通信が増加し、テレビ音声会議システム等の多地点（３地点以上）の通話が可能な音声通信システムがコスト削減等を目的に普及しつつある。 In recent years, with the development of networks such as FTTH (Fiber To The Home), public wireless networks, and high-speed mobile communication networks, voice communication using voice communication devices has increased in various places, and multipoints (such as TV voice conference systems) Voice communication systems capable of calling at three or more locations are becoming popular for the purpose of cost reduction and the like.

多地点通話では、利用するマイクロホンの感度や指向性の違い、マイクロホンと話者の距離等により、話者によって音量レベルが様々に異なっている。このため、受話側での音量調節が困難となっている。 In a multipoint call, the volume level varies depending on the speaker depending on the sensitivity and directivity of the microphone to be used and the distance between the microphone and the speaker. This makes it difficult to adjust the volume on the receiving side.

受話側のユーザが調節できる音量は、各話者の音声がミックスされた後の音声だけであることが多く、ある話者の音量に合わせて音量を調節すると、別の話者の音量が不適切になるという問題が生じやすい。 The volume that can be adjusted by the user on the receiver side is often only the audio after each speaker's audio is mixed, and if the volume is adjusted to match the volume of one speaker, the volume of the other speaker may not be adjusted. The problem of becoming appropriate is likely to occur.

これを解決するため、音声通信システムでは、受信した音声が予め定めておいた目標音量レベルになるよう、音量を調節する自動音量制御装置（ＡｕｔｏｍａｔｉｃＧａｉｎＣｏｎｔｒｏｌ：ＡＧＣ）が搭載されていることが多い。自動音量制御装置は、図１に示すように、ＡＧＣ部１において、入力音声信号の音量を予め定めておいた目標音量レベルに調節して出力する技術である。 In order to solve this, an audio communication system is often equipped with an automatic volume control device (AGC) that adjusts the volume so that the received voice has a predetermined target volume level. . As shown in FIG. 1, the automatic volume control device is a technique for adjusting and outputting the volume of an input audio signal to a predetermined target volume level in the AGC unit 1.

図２は、多地点音声通信システムに用いられる音声通信装置の受話側部分の一例のブロック図を示す。同図中、ある話者からの音声信号はＡＧＣ部２で目標音量レベルに調節されてミキシング部４に供給され、他の話者からの音声信号はＡＧＣ部３で目標音量レベルに調節されてミキシング部４に供給されて、ミキシング部４から合成音声信号が出力される。 FIG. 2 is a block diagram showing an example of a receiving side portion of a voice communication apparatus used in a multipoint voice communication system. In the figure, an audio signal from a certain speaker is adjusted to a target volume level by the AGC unit 2 and supplied to the mixing unit 4, and an audio signal from another speaker is adjusted to the target volume level by the AGC unit 3. The signal is supplied to the mixing unit 4 and a synthesized voice signal is output from the mixing unit 4.

なお、特許文献１には、信号を音声区間と雑音区間に分類し、ＳＮ比に応じて雑音区間の増幅率を音声区間の増幅率以下にすることで、雑音の増幅による耳障り感を抑えることが記載されている。 In Patent Document 1, the signal is classified into a voice section and a noise section, and the noise section amplification factor is set to be equal to or lower than the voice section amplification ratio in accordance with the SN ratio, thereby suppressing a sense of harshness due to noise amplification. Is described.

また、特許文献２には、複数の入力信号のうち１つの信号を用いて基準となる利得を決定し、その他の信号の利得を、前述の基準となる利得をベースとして自動的に微調節を行って決定することにより、全ての音声信号を容易に同じ音量に調節することが記載されている。
特開２００３−６０４５９号公報特開２００６−２８７７１６号公報 In Patent Document 2, one of a plurality of input signals is used to determine a reference gain, and the gains of other signals are automatically finely adjusted based on the above-described reference gain. It is described that all audio signals can be easily adjusted to the same volume by making a determination.
JP 2003-60459 A JP 2006-287716 A

３地点以上の多地点通話時にＳＮ比が悪い音声の話者が存在する場合について考える。図３に示すように、第１話者の入力音声信号はＳＮ比が高く、第２話者の入力音声信号はＳＮ比が中くらい、第３話者の入力音声信号はＳＮ比が低いとする。 Consider a case in which there is a voice speaker with a poor signal-to-noise ratio during a multipoint call at three or more points. As shown in FIG. 3, if the input voice signal of the first speaker has a high S / N ratio, the input voice signal of the second speaker has a medium S / N ratio, and the input voice signal of the third speaker has a low S / N ratio. To do.

各話者の入力音声信号はそれぞれＡＧＣ部において目標音量レベルまで増幅されるため、第３話者の音声信号の雑音レベルは第１，第２話者の音声信号の雑音レベルに比して大きくなる。このため、増幅後の第１〜第３話者の音声信号をミキシングした音声信号における雑音レベルが大きくなり、第１，第２話者の音声信号のＳＮ比までも悪化して、音声が聞き取りにくくなるという問題がある。 Since each speaker's input speech signal is amplified to the target volume level in the AGC unit, the noise level of the speech signal of the third speaker is larger than the noise level of the speech signals of the first and second speakers. Become. For this reason, the noise level in the audio signal obtained by mixing the amplified audio signals of the first to third speakers is increased, the SN ratio of the audio signals of the first and second speakers is deteriorated, and the voice is heard. There is a problem that it becomes difficult.

本発明は、上記の点に鑑みなされたものであり、雑音レベルが大きくなりすぎることを抑制し、ＳＮ比の悪い話者が存在しても他の話者の音声が聞き取り難くなることを低減する自動音量制御装置及びそれを用いた音声通信装置を提供することを目的とする。 The present invention has been made in view of the above points, suppresses an excessive increase in noise level, and reduces the difficulty of hearing other speakers' voices even if there are speakers with poor signal-to-noise ratios. It is an object of the present invention to provide an automatic sound volume control device and a voice communication device using the same.

本発明の一実施態様による自動音量制御装置は、入力音声信号の音声部分と非音声部分を判定する音声判定手段と、
前記入力音声信号の音声部分における音声レベルを算出する音声レベル算出手段と、
前記入力音声信号の非音声部分における雑音レベルを算出する雑音レベル算出手段と、
前記音声レベルと前記雑音レベルから前記入力音声信号のＳＮ比を算出するＳＮ比算出手段と、
前記音声レベルと前記雑音レベルと前記ＳＮ比と予め設定されている目標音量レベルから前記入力音声信号の増幅率を算出する増幅率算出手段と、
前記入力音声信号を前記増幅率で増幅して出力する増幅手段と、
を有し、
前記増幅率算出手段は、前記ＳＮ比が閾値以上のとき前記音声レベルが前記目標音量レベルとなるように増幅率を算出し、前記ＳＮ比が閾値未満のとき前記雑音レベルが前記目標音量レベルから前記閾値だけ低い値となるように増幅率を算出する。 An automatic volume control device according to an embodiment of the present invention includes a sound determination unit that determines a sound portion and a non-voice portion of an input sound signal;
A sound level calculating means for calculating a sound level in a sound portion of the input sound signal;
Noise level calculation means for calculating a noise level in a non-voice portion of the input voice signal;
SN ratio calculating means for calculating an SN ratio of the input voice signal from the voice level and the noise level;
An amplification factor calculating means for calculating an amplification factor of the input audio signal from the audio level, the noise level, the SN ratio, and a preset target volume level;
Amplifying means for amplifying and outputting the input audio signal at the amplification factor;
I have a,
The amplification factor calculating means calculates an amplification factor so that the sound level becomes the target sound volume level when the SN ratio is equal to or greater than a threshold value, and the noise level is calculated from the target sound volume level when the SN ratio is less than the threshold value. The amplification factor is calculated so as to be lower by the threshold value .

本発明の他の一実施態様による自動音量制御装置は、入力音声信号の音声部分と非音声部分を判定する音声判定手段と、
前記入力音声信号の音声部分における音声レベルを算出する音声レベル算出手段と、
前記入力音声信号の非音声部分における雑音レベルを算出する雑音レベル算出手段と、
前記音声レベルと前記雑音レベルから前記入力音声信号のＳＮ比を算出するＳＮ比算出手段と、
前記音声レベルと前記雑音レベルと前記ＳＮ比算出手段で算出したＳＮ比と予め設定されている目標音量レベルと一又は複数の他の自動音量制御装置から供給されるＳＮ比から前記入力音声信号の増幅率を算出する増幅率算出手段と、
前記入力音声信号を前記増幅率で増幅して出力する増幅手段と、
を有し、
前記増幅率算出手段は、前記ＳＮ比算出手段で算出したＳＮ比が前記一又は複数の他の自動音量制御装置から供給されるＳＮ比以上のとき前記音声レベルが前記目標音量レベルとなるように増幅率を算出し、前記ＳＮ比算出手段で算出したＳＮ比が前記一又は複数の他の自動音量制御装置から供給されるＳＮ比未満のとき前記雑音レベルが前記目標音量レベルから最も高いＳＮ比だけ低い値となるように増幅率を算出する。 An automatic volume control device according to another embodiment of the present invention includes a sound determination unit that determines a sound part and a non-sound part of an input sound signal;
A sound level calculating means for calculating a sound level in a sound portion of the input sound signal;
Noise level calculation means for calculating a noise level in a non-voice portion of the input voice signal;
SN ratio calculating means for calculating an SN ratio of the input voice signal from the voice level and the noise level;
The input audio signal is calculated from the audio level, the noise level, the SN ratio calculated by the SN ratio calculating means, the preset target volume level, and the SN ratio supplied from one or more other automatic volume control devices. An amplification factor calculating means for calculating the amplification factor;
Amplifying means for amplifying and outputting the input audio signal at the amplification factor;
I have a,
The amplification factor calculating means is configured such that when the SN ratio calculated by the SN ratio calculating means is equal to or higher than the SN ratio supplied from the one or more other automatic sound volume control devices, the sound level becomes the target sound volume level. When the S / N ratio calculated by the S / N ratio calculating means is less than the S / N ratio supplied from the one or more other automatic volume control devices, the noise level is the highest S / N ratio from the target volume level. The amplification factor is calculated so as to be a low value .

また、前記自動音量制御装置において、
前記増幅率算出手段は、前記ＳＮ比算出手段で算出したＳＮ比が閾値以上のとき又は前記一又は複数の他の自動音量制御装置から供給されるＳＮ比以上のとき前記音声レベルが前記目標音量レベルとなるように増幅率を算出し、前記ＳＮ比算出手段で算出したＳＮ比が閾値以下かつ前記一又は複数の他の自動音量制御装置から供給されるＳＮ比のいずれかが前記閾値以上のとき前記雑音レベルが前記目標音量レベルから前記閾値だけ低い値となるように増幅率を算出し、前記ＳＮ比算出手段で算出したＳＮ比が閾値以下かつ前記一又は複数の他の自動音量制御装置から供給されるＳＮ比が全て前記閾値未満のとき前記雑音レベルが前記目標音量レベルから前記一又は複数の他の自動音量制御装置から供給されるＳＮ比のうち最も高いＳＮ比だけ低い値となるように増幅率を算出する。 In the automatic volume control device,
The amplification factor calculating means is configured such that when the SN ratio calculated by the SN ratio calculating means is greater than or equal to a threshold value or greater than or equal to an SN ratio supplied from the one or more other automatic volume control devices, the audio level is the target volume. An amplification factor is calculated so as to be a level, and the SN ratio calculated by the SN ratio calculation means is not more than a threshold value, and any of the SN ratios supplied from the one or more other automatic volume control devices is not less than the threshold value. When the amplification factor is calculated so that the noise level is lower than the target volume level by the threshold value, the SN ratio calculated by the SN ratio calculation unit is less than the threshold value and the one or more other automatic volume control devices The noise level is the highest SN ratio among the SN ratios supplied from the target volume level from the one or more other automatic volume control devices when all the S / N ratios supplied from Calculating the amplification factor so that the lower value.

また、前記自動音量制御装置において、
前記音声レベル算出手段は、今回の入力音声信号フレームで算出した音声レベルを前回までの入力音声信号フレームで得た音声レベルにより平滑化して出力する。 In the automatic volume control device,
The voice level calculation means smoothes and outputs the voice level calculated in the current input voice signal frame with the voice level obtained in the previous input voice signal frame.

また、前記自動音量制御装置において、
前記雑音レベル算出手段は、今回の入力音声信号フレームで算出した雑音レベルを前回までの入力音声信号フレームで得た雑音レベルにより平滑化して出力する。 In the automatic volume control device,
The noise level calculation means smoothes and outputs the noise level calculated in the current input voice signal frame with the noise level obtained in the previous input voice signal frame.

また、前記自動音量制御装置において、
前記増幅率算出手段は、今回の入力音声信号フレームで算出した増幅率を前回までの入力音声信号フレームで得た増幅率により平滑化して出力する。 In the automatic volume control device,
The amplification factor calculating means smoothes and outputs the amplification factor calculated for the current input audio signal frame with the amplification factor obtained for the previous input audio signal frame.

また、本発明の一実施態様による音声通信装置は、前記自動音量制御装置を複数備え、
前記複数の自動音量制御装置から出力される音声信号を混合するミキシング手段を、
有する。 In addition, a voice communication device according to an embodiment of the present invention includes a plurality of the automatic volume control devices,
Mixing means for mixing audio signals output from the plurality of automatic volume control devices;
Have.

本発明によれば、雑音レベルが大きくなりすぎることを抑制できる。 According to the present invention, it is possible to suppress an excessive increase in noise level.

ひいては、ＳＮ比の悪い話者が存在しても他の話者の音声が聞き取り難くなることを低減することができる。 As a result, even if there is a speaker with a poor S / N ratio, it is possible to reduce the difficulty of listening to the voices of other speakers.

以下、図面に基づいて本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜第１実施形態＞
図４は、自動音量制御装置の第１実施形態の構成例を示す。同図中、端子１１からの入力音声信号をＶＡＤ（ＶｏｉｃｅＡｃｔｉｖｉｔｙＤｅｔｅｃｔｉｏｎ：音声活動判定）部１２に供給する。なお、音声信号は、例えばサンプリング周波数８ＫＨｚでサンプリングされ、１６０サンプル（２０ｍｓｅｃに相当）を１フレームとするデジタル音声信号である。 <First Embodiment>
FIG. 4 shows a configuration example of the first embodiment of the automatic volume control device. In the figure, an input audio signal from a terminal 11 is supplied to a VAD (Voice Activity Detection) unit 12. Note that the audio signal is a digital audio signal sampled at a sampling frequency of 8 KHz, for example, and 160 samples (corresponding to 20 msec) as one frame.

ＶＡＤ部１２は、例えば特許第３８４９１１６号に記載されたものであり、入力信号として環境騒音が重畳した音声信号が時系列順に与えられるフレーム毎に、電力、零交差率、パワースペクトルのピーク周波数、ピッチ周期等を用いて音声の特徴量を算出し、パワースペクトルのピーク周波数の高次成分のみの相違を基に音声の特徴量を算出し、上記２つの特徴量を基に音声か非音声（すなわち雑音）であるかを判定して、判定結果を音声信号と共に音声レベル更新部１３及び雑音レベル更新部１４に供給する。 The VAD unit 12 is described in, for example, Japanese Patent No. 3849116. For each frame in which an audio signal on which environmental noise is superimposed as an input signal is given in time series, power, zero crossing rate, power spectrum peak frequency, A voice feature value is calculated using a pitch period, etc., a voice feature value is calculated based on only a difference in higher-order components of the peak frequency of the power spectrum, and voice or non-voice (based on the above two feature values) In other words, the determination result is supplied to the audio level update unit 13 and the noise level update unit 14 together with the audio signal.

音声レベル更新部１３は、今回フレームｎで音声部分と判定された場合、まず、今回フレームにおける音声レベルＶを（１）式にて求める。 When the sound level update unit 13 determines that the sound part is the sound part in the current frame n, the sound level update unit 13 first obtains the sound level V in the current frame by the expression (1).

（１）式において、ｉｎｐｕｔ（ｋ）は入力音声信号の振幅を、Ｍはフレーム長をそれぞれ表す。次に、音声レベルＶと、前回フレームの平均音声レベルＶ＿ａｖｅ（ｎ−１）とを、平滑化係数ＣＯＦ１を用いて平滑化を行い、今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）を求める。なお、今回フレームが非音声部分と判定された場合は、平均音声レベルＶ＿ａｖｅ（ｎ）は更新しない。この今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）を増幅率決定部１５に供給する。なお、平滑化係数ＣＯＦ１は例えば０．９０〜０．９９程度の値である。

In equation (1), input (k) represents the amplitude of the input audio signal, and M represents the frame length. Next, the audio level V and the average audio level V_ave (n−1) of the previous frame are smoothed using the smoothing coefficient COF1, and the average audio level V_ave (n) of the current frame is obtained. Note that if it is determined that the current frame is a non-voice portion, the average voice level V_ave (n) is not updated. The average voice level V_ave (n) of the current frame is supplied to the amplification factor determination unit 15. The smoothing coefficient COF1 is a value of about 0.90 to 0.99, for example.

Ｖ＿ａｖｅ（ｎ）＝Ｖ＿ａｖｅ（ｎ−１）×ＣＯＦ１＋Ｖ×（１．０−ＣＯＦ１）
…（２）
雑音レベル更新部１４は、今回フレームｎで非音声部分と判定された場合、まず、今回フレームにおける雑音レベルＮを（３）式にて求める。 V_ave (n) = V_ave (n−1) × COF1 + V × (1.0−COF1)
... (2)
When it is determined that the current frame n is a non-speech part, the noise level updating unit 14 first obtains the noise level N in the current frame by Equation (3).

次に、雑音レベルＮと、前回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ−１）とを、平滑化係数ＣＯＦ１を用いて平滑化を行い、今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）を求める。なお、今回フレームが音声部分と判定された場合は、平均雑音レベルＮ＿ａｖｅ（ｎ）は更新しない。この今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）を増幅率決定部１５に供給する。

Next, the noise level N and the average noise level N_ave (n−1) of the previous frame are smoothed using the smoothing coefficient COF1, and the average noise level N_ave (n) of the current frame is obtained. Note that if it is determined that the current frame is an audio part, the average noise level N_ave (n) is not updated. The average noise level N_ave (n) of the current frame is supplied to the amplification factor determination unit 15.

Ｎ＿ａｖｅ（ｎ）＝Ｎ＿ａｖｅ（ｎ−１）×ＣＯＦ１＋Ｎ×（１．０−ＣＯＦ１）
…（４）
増幅率決定部１５は、平均音声レベルＶ＿ａｖｅ（ｎ）と、平均雑音レベルＮ＿ａｖｅ（ｎ）と、予め決定されており上位装置から端子１６を介して供給される目標音量レベルから、今回フレームのゲインｓ＿ｇａｉｎを決定してゲイン乗算部１７に供給する。 N_ave (n) = N_ave (n−1) × COF1 + N × (1.0−COF1)
... (4)
The amplification factor determination unit 15 calculates the gain of the current frame from the average sound level V_ave (n), the average noise level N_ave (n), and the target volume level that is determined in advance and supplied from the host device via the terminal 16. s_gain is determined and supplied to the gain multiplier 17.

ゲイン乗算部１７は、端子１１から供給される入力音声信号に増幅率決定部１５からの増幅率を乗算して端子１８から出力音声信号ｏｕｐｕｔ（ｋ）を出力する。 The gain multiplication unit 17 multiplies the input audio signal supplied from the terminal 11 by the amplification factor from the amplification factor determination unit 15 and outputs an output audio signal output (k) from the terminal 18.

ｏｕｔｐｕｔ（ｋ）＝ｉｎｐｕｔ（ｋ）×ｓ＿ｇａｉｎ
（ただし、ｋ＝１，２…Ｍ） …（５）
なお、増幅率決定部１５は、フレーム境界で増幅率が急激に変化することを防ぐため、例えば以下のように増幅率をサンプル単位で滑らかに変動させる構成としても良い。ここで、ｎフレームのｋサンプル目におけるゲインをｇａｉｎ（ｎ，ｋ）とし、ｎフレームのみで求めた瞬時ゲインをｓ＿ｇａｉｎとし、平滑化係数をＣＯＦ２とする。なお、平滑化係数ＣＯＦ２は例えば０．９０〜０．９９程度の値である。 output (k) = input (k) × s_gain
(However, k = 1, 2 ... M) (5)
The amplification factor determination unit 15 may be configured to smoothly vary the amplification factor for each sample as follows, for example, in order to prevent the amplification factor from changing suddenly at the frame boundary. Here, the gain at the kth sample of the n frame is gain (n, k), the instantaneous gain obtained only in the n frame is s_gain, and the smoothing coefficient is COF2. The smoothing coefficient COF2 is a value of about 0.90 to 0.99, for example.

ｇａｉｎ（ｎ，ｋ）＝ｇａｉｎ（ｎ，ｋ−１）×ＣＯＦ２
＋ｓ＿ｇａｉｎ×（１．０−ＣＯＦ２） …（６）
この場合、出力音声信号ｏｕｔｐｕｔ（ｋ）は以下のようになる。 gain (n, k) = gain (n, k−1) × COF 2
+ S_gain × (1.0−COF2) (6)
In this case, the output audio signal output (k) is as follows.

ｏｕｔｐｕｔ（ｋ）＝ｉｎｐｕｔ（ｋ）×ｇａｉｎ（ｎ，ｋ）
（ただし、ｋ＝１，２…Ｍ） …（７）
＜増幅率決定部の構成＞
図５は、増幅率決定部１５の一実施形態の構成例を示す。同図中、増幅率決定部１５は、ＳＮ比算出部２１と増幅率算出部２２から構成されている。 output (k) = input (k) × gain (n, k)
(However, k = 1, 2 ... M) (7)
<Configuration of amplification factor determination unit>
FIG. 5 shows a configuration example of an embodiment of the amplification factor determination unit 15. In the figure, the amplification factor determination unit 15 includes an SN ratio calculation unit 21 and an amplification factor calculation unit 22.

ＳＮ比算出部２１は、音声レベル更新部１３からの今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）と、雑音レベル更新部１４からの今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）からＳＮ比を算出して増幅率算出部２２に供給する。 The SN ratio calculation unit 21 calculates the SN ratio from the average voice level V_ave (n) of the current frame from the voice level update unit 13 and the average noise level N_ave (n) of the current frame from the noise level update unit 14. This is supplied to the amplification factor calculation unit 22.

増幅率算出部２２は、音声レベル更新部１３からの今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）と、雑音レベル更新部１４からの今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）と、ＳＮ比算出部２１からのＳＮ比と、上位装置からの目標音量レベルから今回フレームのゲインｓ＿ｇａｉｎを算出する。 The amplification factor calculation unit 22 includes the average audio level V_ave (n) of the current frame from the audio level update unit 13, the average noise level N_ave (n) of the current frame from the noise level update unit 14, and the SN ratio calculation unit 21. The gain s_gain of the current frame is calculated from the S / N ratio and the target volume level from the host device.

＜増幅率算出部２２の動作＞
図６は、増幅率算出部２２が実行する処理の一例のフローチャートを示す。同図中、ステップＳ１で、増幅率算出部２２は、ＳＮ比算出部２１から供給されるＳＮ比が閾値以上か否かを判別する。ここで、閾値は例えば１２ｄＢ程度の値として予め設定されている。 <Operation of Gain Calculation Unit 22>
FIG. 6 shows a flowchart of an example of processing executed by the amplification factor calculation unit 22. In step S1, the amplification factor calculation unit 22 determines whether or not the SN ratio supplied from the SN ratio calculation unit 21 is equal to or greater than a threshold value. Here, the threshold is set in advance as a value of about 12 dB, for example.

ＳＮ比が閾値以上であれば、ステップＳ２で今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）が、目標音量レベルとなるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 If the S / N ratio is equal to or greater than the threshold, the gain s_gain of the current frame is calculated in step S2 such that the average audio level V_ave (n) of the current frame becomes the target volume level.

一方、ＳＮ比が閾値未満であれば、ステップＳ３で今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）が、目標音量レベルから閾値を減算した値（目標音量レベル−閾値）となるよう今回フレームのゲインｓ＿ｇａｉｎを算出する。 On the other hand, if the SN ratio is less than the threshold value, the gain s_gain of the current frame is set so that the average noise level N_ave (n) of the current frame becomes a value obtained by subtracting the threshold value from the target sound volume level (target sound volume level−threshold) in step S3. Is calculated.

＜多地点音声通信システムの音声通信装置の構成＞
図７は、多地点音声通信システムにおける音声通信装置の受話側部分の第１実施形態のブロック図を示す。同図中、音声通信装置２０の端子２１−１〜２１−ｎには複数の話者から受信した符号化音声信号が供給され、各符号化音声信号は音声復号部２２−１〜２２−ｎそれぞれで復号される。復号された各音声信号はＡＧＣ部２３−１〜２３−ｎそれぞれに供給される。 <Configuration of voice communication device of multipoint voice communication system>
FIG. 7 shows a block diagram of the first embodiment of the receiving side portion of the voice communication apparatus in the multipoint voice communication system. In the figure, encoded speech signals received from a plurality of speakers are supplied to terminals 21-1 to 21-n of the speech communication apparatus 20, and each encoded speech signal is transmitted to speech decoding units 22-1 to 22-n. Decoded at each. Each decoded audio signal is supplied to each of the AGC units 23-1 to 23-n.

ＡＧＣ部２３−１〜２３−ｎそれぞれは、図４及び図５に示す構成で図６に示す動作を行う自動音量制御装置であり、各音声信号の音量制御を行って出力音声信号をミキシング部２４に供給する。ミキシング部２４は、ＡＧＣ部２３−１〜２３−ｎそれぞれから供給される音声信号を混合して端子２５から出力する。 Each of the AGC units 23-1 to 23-n is an automatic volume control device that performs the operation shown in FIG. 6 with the configuration shown in FIGS. 4 and 5, and controls the volume of each audio signal to mix the output audio signal. 24. The mixing unit 24 mixes the audio signals supplied from the AGC units 23-1 to 23-n and outputs the mixed audio signals from the terminal 25.

＜具体的な音量制御動作＞
ここで、図７において、ｎ＝３とした場合の具体的な音量制御動作について説明する。ＡＧＣ部２３−１〜２３−３それぞれに与える目標音量レベルを−２４ｄＢｏｖ（ｏｖ：ｏｖｅｒｌｏａｄ，最大値と比較してどれだけ小さいかのｄＢ表示）、ＳＮ比の閾値＝１２ｄＢとする。 <Specific volume control operation>
Here, a specific sound volume control operation when n = 3 in FIG. 7 will be described. The target sound volume level given to each of the AGC units 23-1 to 23-3 is set to -24 dBov (ov: overload, dB display of how much smaller than the maximum value), and the SN ratio threshold = 12 dB.

図８に示すように、ＡＧＣ部２３−１に入力される第１話者の音声レベルが−３０ｄＢｏｖで、雑音レベルが−５４ｄＢｏｖで、ＳＮ比が２４ｄＢであり、ＡＧＣ部２３−２に入力される第２話者の音声レベルが−４２ｄＢｏｖで、雑音レベルが−４８ｄＢｏｖで、ＳＮ比が６ｄＢであり、ＡＧＣ部２３−３に入力される第３話者の音声レベルが−１８ｄＢｏｖで、雑音レベルが−３６ｄＢｏｖで、ＳＮ比が１８ｄＢである場合を考える。 As shown in FIG. 8, the voice level of the first speaker input to the AGC unit 23-1 is −30 dBov, the noise level is −54 dBov, the SN ratio is 24 dB, and the input is input to the AGC unit 23-2. The second speaker's voice level is -42 dBov, the noise level is -48 dBov, the S / N ratio is 6 dB, and the third speaker's voice level input to the AGC unit 23-3 is -18 dBov, and the noise level is Is −36 dBov and the S / N ratio is 18 dB.

ＡＧＣ部２３−１では、ＳＮ比＝２４ｄＢが閾値＝１２ｄＢ以上であるため、音声レベル＝−３０ｄＢｏｖを目標音量レベル＝−２４ｄＢｏｖとするように、増幅率＝６ｄＢ（増幅）が算出される。この結果、ＡＧＣ部２３−１の出力する第１話者の音声レベルは−２４ｄＢｏｖ、雑音レベルが−４８ｄＢｏｖ、ＳＮ比が２４ｄＢとなる。 In the AGC unit 23-1, since the SN ratio = 24 dB is equal to or greater than the threshold = 12 dB, the amplification factor = 6 dB (amplification) is calculated so that the audio level = −30 dBov is set to the target volume level = −24 dBov. As a result, the voice level of the first speaker output from the AGC unit 23-1 is -24 dBov, the noise level is -48 dBov, and the SN ratio is 24 dB.

ＡＧＣ部２３−２では、ＳＮ比＝６ｄＢが閾値＝１２ｄＢ未満であるため、雑音レベル＝−４８ｄＢｏｖを目標音量レベル−１２ｄＢ＝−３６ｄＢｏｖとするように、増幅率＝１２ｄＢ（増幅）が算出される。この結果、ＡＧＣ部２３−２の出力する第２話者の音声レベルは−３０ｄＢｏｖ、雑音レベルが−３６ｄＢｏｖ、ＳＮ比が６ｄＢとなる。 In the AGC unit 23-2, since the SN ratio = 6 dB is less than the threshold value = 12 dB, the amplification factor = 12 dB (amplification) is calculated so that the noise level = −48 dBov is set to the target volume level−12 dB = −36 dBov. . As a result, the voice level of the second speaker output from the AGC unit 23-2 is -30 dBov, the noise level is -36 dBov, and the SN ratio is 6 dB.

ＡＧＣ部２３−３では、ＳＮ比＝１８ｄＢが閾値＝１２ｄＢ以上であるため、音声レベル＝−１８ｄＢｏｖを目標音量レベル＝−２４ｄＢｏｖとするように、増幅率＝−６ｄＢ（減衰）が算出される。この結果、ＡＧＣ部２３−３の出力する第３話者の音声レベルは−２４ｄＢｏｖ、雑音レベルが−４２ｄＢｏｖ、ＳＮ比が１８ｄＢとなる。 In the AGC unit 23-3, since the SN ratio = 18 dB is equal to or greater than the threshold = 12 dB, the amplification factor = −6 dB (attenuation) is calculated so that the audio level = −18 dBov is set to the target sound volume level = −24 dBov. As a result, the voice level of the third speaker output from the AGC unit 23-3 is -24 dBov, the noise level is -42 dBov, and the SN ratio is 18 dB.

このように、ＳＮ比が良い話者に対しては音声レベルが目標音量レベルとなるようにゲインを決定し、ＳＮ比が悪い話者に対しては雑音レベルが目標音量レベルから閾値を減算した値より大きくならないようにゲインを決定することで、第１話者と第３話者について一定以上のＳＮ比を確保でき、ＳＮ比が悪い第２話者の影響によって他の話者の音声が聞き取りにくくなる問題を生じず、良好な通話が可能となる。 Thus, the gain is determined so that the voice level becomes the target volume level for the speaker having a good S / N ratio, and the threshold value is subtracted from the target volume level for the speaker having a poor S / N ratio. By determining the gain so that it does not become larger than the value, it is possible to secure a certain S / N ratio for the first speaker and the third speaker, and the voices of other speakers are affected by the influence of the second speaker having a poor S / N ratio. A good call can be made without causing the problem of difficulty in hearing.

これに対し、従来のように入力音声を目標音量レベルに調節するだけの技術では、図９に示すように、目標音量レベルを−２４ｄＢｏｖとして、第１〜第３話者から図８と同様の入力音声があった場合、第１，第３話者の増幅率は図８の例と変わらないものの、第２話者の増幅率は１８ｄＢとなって雑音レベルが−３０ｄＢｏｖとなり、この結果、第１，第３話者の音声が聞き取りにくくなる。 On the other hand, in the conventional technique that only adjusts the input voice to the target volume level, as shown in FIG. 9, the target volume level is set to −24 dBov, and the same as in FIG. 8 from the first to third speakers. When there is an input voice, the amplification factor of the first and third speakers is not different from the example of FIG. 8, but the amplification factor of the second speaker is 18 dB and the noise level is −30 dBov. 1. It becomes difficult to hear the voice of the third speaker.

＜第２実施形態＞
図１０は、自動音量制御装置の第２実施形態の構成例を示す。同図中、図４と同一部分には同一符号を付す。 Second Embodiment
FIG. 10 shows a configuration example of the second embodiment of the automatic volume control device. In the figure, the same parts as those in FIG.

図１０において、端子１１からの入力音声信号をＶＡＤ（ＶｏｉｃｅＡｃｔｉｖｉｔｙＤｅｔｅｃｔｉｏｎ：音声活動判定）部１２に供給する。なお、音声信号は、例えばサンプリング周波数８ＫＨｚでサンプリングされ、１６０サンプル（２０ｍｓｅｃに相当）を１フレームとするデジタル音声信号である。 In FIG. 10, an input voice signal from the terminal 11 is supplied to a VAD (Voice Activity Detection) unit 12. Note that the audio signal is a digital audio signal sampled at a sampling frequency of 8 KHz, for example, and 160 samples (corresponding to 20 msec) as one frame.

音声レベル更新部１３は、今回フレームｎで音声部分と判定された場合、まず、今回フレームにおける音声レベルＶを（１）式にて求める。
次に、音声レベルＶと、前回フレームの平均音声レベルＶ＿ａｖｅ（ｎ−１）とを、平滑化係数ＣＯＦ１を用いて平滑化を行い、今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）を求める。なお、今回フレームが非音声部分と判定された場合は、平均音声レベルＶ＿ａｖｅ（ｎ）は更新しない。この今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）を増幅率決定部３５に供給する。なお、平滑化係数ＣＯＦ１は例えば０．９０〜０．９９程度の値である。 When the sound level update unit 13 determines that the sound part is the sound part in the current frame n, the sound level update unit 13 first obtains the sound level V in the current frame by the expression (1).
Next, the audio level V and the average audio level V_ave (n−1) of the previous frame are smoothed using the smoothing coefficient COF1, and the average audio level V_ave (n) of the current frame is obtained. Note that if it is determined that the current frame is a non-voice portion, the average voice level V_ave (n) is not updated. The average voice level V_ave (n) of the current frame is supplied to the amplification factor determination unit 35. The smoothing coefficient COF1 is a value of about 0.90 to 0.99, for example.

Ｖ＿ａｖｅ（ｎ）＝Ｖ＿ａｖｅ（ｎ−１）×ＣＯＦ１＋Ｖ×（１．０−ＣＯＦ１）
…（２）
雑音レベル更新部１４は、今回フレームｎで非音声部分と判定された場合、まず、今回フレームにおける雑音レベルＮを（３）式にて求める。
次に、雑音レベルＮと、前回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ−１）とを、平滑化係数ＣＯＦ１を用いて平滑化を行い、今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）を求める。なお、今回フレームが音声部分と判定された場合は、平均雑音レベルＮ＿ａｖｅ（ｎ）は更新しない。この今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）を増幅率決定部３５に供給する。 V_ave (n) = V_ave (n−1) × COF1 + V × (1.0−COF1)
... (2)
When it is determined that the current frame n is a non-speech part, the noise level updating unit 14 first obtains the noise level N in the current frame by Equation (3).
Next, the noise level N and the average noise level N_ave (n−1) of the previous frame are smoothed using the smoothing coefficient COF1, and the average noise level N_ave (n) of the current frame is obtained. Note that if it is determined that the current frame is an audio part, the average noise level N_ave (n) is not updated. The average noise level N_ave (n) of the current frame is supplied to the amplification factor determination unit 35.

Ｎ＿ａｖｅ（ｎ）＝Ｎ＿ａｖｅ（ｎ−１）×ＣＯＦ１＋Ｎ×（１．０−ＣＯＦ１）
…（４）
増幅率決定部３５は、平均音声レベルＶ＿ａｖｅ（ｎ）と、平均雑音レベルＮ＿ａｖｅ（ｎ）と、予め決定されており上位装置から端子１６を介して供給される目標音量レベルと、端子３６を介して供給される他の自動音量制御装置から供給される他の話者のＳＮ比から、今回フレームのゲインｓ＿ｇａｉｎを決定してゲイン乗算部１７に供給する。 N_ave (n) = N_ave (n−1) × COF1 + N × (1.0−COF1)
... (4)
The amplification factor determination unit 35 receives the average sound level V_ave (n), the average noise level N_ave (n), the target volume level that is determined in advance and supplied from the host device via the terminal 16, and the terminal 36. The gain s_gain of the current frame is determined from the S / N ratio of the other speakers supplied from the other automatic volume control device supplied in this way, and supplied to the gain multiplier 17.

ゲイン乗算部１７は、端子１１から供給される入力音声信号に増幅率決定部３５からの増幅率を乗算して端子１８から出力音声信号ｏｕｐｕｔ（ｋ）出力する。 The gain multiplying unit 17 multiplies the input audio signal supplied from the terminal 11 by the amplification factor from the amplification factor determining unit 35 and outputs the output audio signal output (k) from the terminal 18.

ｏｕｔｐｕｔ（ｋ）＝ｉｎｐｕｔ（ｋ）×ｓ＿ｇａｉｎ
（ただし、ｋ＝１，２…Ｍ） …（５）
なお、増幅率決定部３５は、フレーム境界で増幅率が急激に変化することを防ぐため、例えば以下のように増幅率をサンプル単位で滑らかに変動させる構成としても良い。ここで、ｎフレームのｋサンプル目におけるゲインをｇａｉｎ（ｎ，ｋ）とし、ｎフレームのみで求めた瞬時ゲインをｓ＿ｇａｉｎとし、平滑化係数をＣＯＦ２とする。なお、平滑化係数ＣＯＦ２は例えば０．９０〜０．９９程度の値である。 output (k) = input (k) × s_gain
(However, k = 1, 2 ... M) (5)
Note that the amplification factor determination unit 35 may be configured to smoothly vary the amplification factor for each sample as follows, for example, in order to prevent the amplification factor from changing suddenly at the frame boundary. Here, the gain at the kth sample of the n frame is gain (n, k), the instantaneous gain obtained only in the n frame is s_gain, and the smoothing coefficient is COF2. The smoothing coefficient COF2 is a value of about 0.90 to 0.99, for example.

ｏｕｔｐｕｔ（ｋ）＝ｉｎｐｕｔ（ｋ）×ｇａｉｎ（ｎ，ｋ）
（ただし、ｋ＝１，２…Ｍ） …（７）
＜増幅率決定部の構成＞
図１１は、増幅率決定部３５の一実施形態の構成例を示す。同図中、図５と同一部分には同一符号を付す。図１１において、増幅率決定部３５は、ＳＮ比算出部２１とＳＮ比比較部３７と増幅率算出部３８から構成されている。 output (k) = input (k) × gain (n, k)
(However, k = 1, 2 ... M) (7)
<Configuration of amplification factor determination unit>
FIG. 11 shows a configuration example of an embodiment of the amplification factor determination unit 35. In the figure, the same parts as those in FIG. In FIG. 11, the amplification factor determination unit 35 includes an SN ratio calculation unit 21, an SN ratio comparison unit 37, and an amplification factor calculation unit 38.

ＳＮ比算出部２１は、音声レベル更新部１３からの今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）と、雑音レベル更新部１４からの今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）からＳＮ比を算出してＳＮ比比較部３７に供給する。 The SN ratio calculation unit 21 calculates the SN ratio from the average voice level V_ave (n) of the current frame from the voice level update unit 13 and the average noise level N_ave (n) of the current frame from the noise level update unit 14. The signal is supplied to the SN ratio comparison unit 37.

ＳＮ比比較部３７は、ＳＮ比算出部２１で算出した自装置の話者のＳＮ比を閾値及び他の話者のＳＮ比と比較して、比較結果を自装置の話者のＳＮ比と閾値と他の話者のＳＮ比と共に増幅率算出部３８に供給する。 The S / N ratio comparison unit 37 compares the S / N ratio of the speaker of the own device calculated by the S / N ratio calculation unit 21 with the threshold value and the S / N ratio of the other speakers, and compares the comparison result with the S / N ratio of the speaker of the own device. The threshold value and the SN ratio of other speakers are supplied to the amplification factor calculation unit 38.

増幅率算出部３８は、音声レベル更新部１３からの今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）と、雑音レベル更新部１４からの今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）と、ＳＮ比算出部２１からのＳＮ比と、上位装置からの目標音量レベルと、ＳＮ比比較部３７からの比較結果と、自装置の話者のＳＮ比と、閾値と、一又は複数の他の話者のＳＮ比から今回フレームのゲインｓ＿ｇａｉｎを算出する。 The amplification factor calculation unit 38 includes the average audio level V_ave (n) of the current frame from the audio level update unit 13, the average noise level N_ave (n) of the current frame from the noise level update unit 14, and the SN ratio calculation unit 21. , The target volume level from the host device, the comparison result from the SN ratio comparison unit 37, the SN ratio of the speaker of the own device, the threshold value, and the SN ratio of one or more other speakers. From this, the gain s_gain of the current frame is calculated.

＜増幅率算出部３８の動作＞
図１２は、増幅率算出部３８が実行する処理の一例のフローチャートを示す。同図中、ステップＳ１１で、増幅率算出部３８は、ＳＮ比算出部２１から供給される比較結果から自装置の話者のＳＮ比が一又は複数の他の話者のＳＮ比の中で最も高いか否かを判別し、自装置の話者のＳＮ比が最も高い場合には、ステップＳ１２で今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）が目標音量レベルとなるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 <Operation of Gain Calculation Unit 38>
FIG. 12 shows a flowchart of an example of processing executed by the amplification factor calculation unit 38. In the figure, in step S11, the amplification factor calculation unit 38 determines that the SN ratio of the speaker of the own device is one of the SN ratios of one or more other speakers based on the comparison result supplied from the SN ratio calculation unit 21. If the S / N ratio of the speaker of the apparatus is the highest, the gain s_gain of the current frame so that the average voice level V_ave (n) of the current frame becomes the target volume level in step S12. Is calculated.

一方、自装置の話者のＳＮ比より他の話者のＳＮ比が高い場合には、ステップＳ１３で今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）が目標音量レベルからＳＮ比が最も高い他の話者のＳＮ比を減算した値（目標音量レベル−ＳＮ比が最大の話者のＳＮ比）となるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 On the other hand, if the S / N ratio of the other speaker is higher than the S / N ratio of the speaker of the own device, in step S13, the average noise level N_ave (n) of the current frame is the other story having the highest S / N ratio from the target volume level. The gain s_gain of the current frame is calculated so as to be a value obtained by subtracting the SN ratio of the speaker (the target sound volume level-the SN ratio of the speaker having the maximum SN ratio).

＜多地点音声通信システムの音声通信装置の構成＞
図１３は、多地点音声通信システムにおける音声通信装置の受話側部分の第２実施形態のブロック図を示す。同図中、図７と同一部分には同一符号を付す。 <Configuration of voice communication device of multipoint voice communication system>
FIG. 13 is a block diagram of a second embodiment of the receiving side portion of the voice communication apparatus in the multipoint voice communication system. In the figure, the same parts as those in FIG.

図１３において、音声通信装置２０の端子２１−１〜２１−ｎには複数の話者から受信した符号化音声信号が供給され、各符号化音声信号は音声復号部２２−１〜２２−ｎそれぞれで復号される。復号された各音声信号はＡＧＣ部４３−１〜４３−ｎそれぞれに供給される。 In FIG. 13, encoded speech signals received from a plurality of speakers are supplied to terminals 21-1 to 21-n of the speech communication device 20, and each encoded speech signal is transmitted to speech decoding units 22-1 to 22-n. Decoded at each. Each decoded audio signal is supplied to each of the AGC units 43-1 to 43-n.

ＡＧＣ部４３−１〜４３−ｎそれぞれは、図１０及び図１１に示す構成で図１２（又は図１５）に示す動作を行う自動音量制御装置であり、各音声信号の音量制御を行って出力音声信号をミキシング部２４に供給すると共に、自装置の話者のＳＮ比を求めて他の全ての自動音量制御装置に供給する。ミキシング部２４は、ＡＧＣ部４３−１〜４３−ｎそれぞれから供給される音声信号を混合して端子２５から出力する。 Each of the AGC units 43-1 to 43-n is an automatic volume control device that performs the operation shown in FIG. 12 (or FIG. 15) with the configuration shown in FIGS. 10 and 11, and controls the volume of each audio signal and outputs it. The audio signal is supplied to the mixing unit 24, and the SN ratio of the speaker of the own apparatus is obtained and supplied to all other automatic volume control apparatuses. The mixing unit 24 mixes the audio signals supplied from the AGC units 43-1 to 43-n and outputs the mixed audio signals from the terminal 25.

＜具体的な音量制御動作＞
ここで、図１３において、ｎ＝３とした場合の具体的な音量制御動作について説明する。ＡＧＣ部４３−１〜２３−３それぞれに与える目標音量レベルを−２４ｄＢｏｖ（ｏｖ：ｏｖｅｒｌｏａｄ，最大値と比較してどれだけ小さいかのｄＢ表示）、ＳＮ比の閾値＝１２ｄＢとする。 <Specific volume control operation>
Here, a specific sound volume control operation when n = 3 in FIG. 13 will be described. It is assumed that the target volume level given to each of the AGC units 43-1 to 23-3 is -24 dBov (ov: overload, dB display of how much smaller than the maximum value), and the SN ratio threshold = 12 dB.

図１４に示すように、ＡＧＣ部４３−１に入力される第１話者の音声レベルが−３０ｄＢｏｖで、雑音レベルが−４０ｄＢｏｖで、ＳＮ比が１０ｄＢであり、ＡＧＣ部４３−２に入力される第２話者の音声レベルが−４２ｄＢｏｖで、雑音レベルが−４８ｄＢｏｖで、ＳＮ比が６ｄＢであり、ＡＧＣ部４３−３に入力される第３話者の音声レベルが−１８ｄＢｏｖで、雑音レベルが−２６ｄＢｏｖで、ＳＮ比が８ｄＢである場合を考える。 As shown in FIG. 14, the voice level of the first speaker input to the AGC unit 43-1 is −30 dBov, the noise level is −40 dBov, the SN ratio is 10 dB, and the input is input to the AGC unit 43-2. The second speaker's voice level is -42 dBov, the noise level is -48 dBov, the SN ratio is 6 dB, and the third speaker's voice level input to the AGC unit 43-3 is -18 dBov, and the noise level Is −26 dBov and the S / N ratio is 8 dB.

この場合、ＡＧＣ部４３−１の出力する第１話者のＳＮ比が最も高い。ＡＧＣ部４３−１では、ＳＮ比が最も高いため、音声レベル＝−３０ｄＢｏｖを目標音量レベル＝−２４ｄＢｏｖとするように、増幅率＝６ｄＢ（増幅）が算出される。この結果、ＡＧＣ部４３−１の出力する第１話者の音声レベルは−２４ｄＢｏｖ、雑音レベルが−３４ｄＢｏｖ、ＳＮ比が１０ｄＢとなる。 In this case, the SN ratio of the first speaker output from the AGC unit 43-1 is the highest. In the AGC unit 43-1, since the SN ratio is the highest, the amplification factor = 6 dB (amplification) is calculated so that the audio level = −30 dBov is set to the target sound volume level = −24 dBov. As a result, the voice level of the first speaker output from the AGC unit 43-1 is -24 dBov, the noise level is -34 dBov, and the SN ratio is 10 dB.

ＡＧＣ部４３−２では、自装置の第２話者のＳＮ比（６ｄＢ）より第１話者のＳＮ比（１０ｄＢ）が高いため、雑音レベル＝−４８ｄＢｏｖを目標音量レベル（−２４ｄＢｏｖ）−第１話者のＳＮ比（１０ｄＢ）＝−３４ｄＢｏｖとするように、増幅率＝１４ｄＢ（増幅）が算出される。この結果、ＡＧＣ部４３−２の出力する第２話者の音声レベルは−２８ｄＢｏｖ、雑音レベルが−３４ｄＢｏｖ、ＳＮ比が６ｄＢとなる。 In the AGC unit 43-2, since the SN ratio (10 dB) of the first speaker is higher than the SN ratio (6 dB) of the second speaker of the own apparatus, the noise level = −48 dBov is set to the target volume level (−24 dBov) − Amplification factor = 14 dB (amplification) is calculated so that the SN ratio of one speaker (10 dB) = − 34 dBov. As a result, the voice level of the second speaker output from the AGC unit 43-2 is -28 dBov, the noise level is -34 dBov, and the SN ratio is 6 dB.

ＡＧＣ部４３−３では、自装置の第３話者のＳＮ比（８ｄＢ）より第１話者のＳＮ比（１０ｄＢ）が高いため、雑音レベル＝−２６ｄＢｏｖを目標音量レベル（−２４ｄＢｏｖ）−第１話者のＳＮ比（１０ｄＢ）＝−３４ｄＢｏｖとするように、増幅率＝−８ｄＢ（減衰）が算出される。この結果、ＡＧＣ部４３−３の出力する第３話者の音声レベルは−２６ｄＢｏｖ、雑音レベルが−３４ｄＢｏｖ、ＳＮ比が８ｄＢとなる。 In the AGC unit 43-3, since the SN ratio (10 dB) of the first speaker is higher than the SN ratio (8 dB) of the third speaker of the own apparatus, the noise level = −26 dBov is set to the target volume level (−24 dBov) − Amplification factor = −8 dB (attenuation) is calculated so that the SN ratio of one speaker (10 dB) = − 34 dBov. As a result, the voice level of the third speaker output from the AGC unit 43-3 is -26 dBov, the noise level is -34 dBov, and the SN ratio is 8 dB.

このように、ＳＮ比が良い話者に対しては音声レベルが目標音量レベルとなるようにゲインを決定し、ＳＮ比が悪い話者に対しては雑音レベルが目標音量レベルからＳＮ比が最も高い話者のＳＮ比を減算した値より大きくならないようにゲインを決定することで、ＳＮ比が最も高い話者のＳＮ比を維持することができ、ＳＮ比が悪い話者の影響によって他の話者の音声が聞き取りにくくなる問題を生じず、良好な通話が可能となる。 As described above, the gain is determined so that the voice level becomes the target volume level for the speaker having a good S / N ratio, and the noise level is the highest from the target volume level to the speaker for the speaker having a poor S / N ratio. By determining the gain so that it does not become larger than the value obtained by subtracting the SN ratio of the high speaker, the SN ratio of the speaker having the highest SN ratio can be maintained. A good call can be made without causing a problem that it is difficult to hear the voice of the speaker.

＜増幅率算出部３８の他の動作＞
図１５は、増幅率算出部３８が実行する処理の他の例のフローチャートを示す。同図中、ステップＳ２１で、増幅率算出部３８は、ＳＮ比算出部２１から供給される比較結果からＳＮ比が閾値以上か否かを判別する。ここで、閾値は例えば１２ｄＢ程度の値として予め設定されている。 <Other operations of amplification factor calculation unit 38>
FIG. 15 shows a flowchart of another example of processing executed by the amplification factor calculation unit 38. In the figure, in step S <b> 21, the amplification factor calculation unit 38 determines whether or not the SN ratio is equal to or greater than a threshold value from the comparison result supplied from the SN ratio calculation unit 21. Here, the threshold is set in advance as a value of about 12 dB, for example.

ＳＮ比が閾値以上であれば、ステップＳ２２で今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）が目標音量レベルとなるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 If the S / N ratio is equal to or greater than the threshold, the gain s_gain of the current frame is calculated in step S22 such that the average sound level V_ave (n) of the current frame becomes the target volume level.

一方、ＳＮ比が閾値未満であれば、ステップＳ２３で自装置の話者のＳＮ比が一又は複数の他の話者のＳＮ比の中で最も高いか否かを判別し、自装置の話者のＳＮ比が最も高い場合にはステップＳ２２で今回フレームの平均音声レベルＶ＿ａｖｅ（ｎ）が目標音量レベルとなるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 On the other hand, if the SN ratio is less than the threshold value, it is determined in step S23 whether or not the SN ratio of the speaker of the own device is the highest among the SN ratios of one or more other speakers. If the S / N ratio of the user is the highest, the gain s_gain of the current frame is calculated in step S22 so that the average sound level V_ave (n) of the current frame becomes the target volume level.

ステップＳ２３で自装置の話者のＳＮ比より他の話者のＳＮ比が高い場合にはステップＳ２４でＳＮ比が閾値以上の他の話者が存在するか否かを判別し、閾値以上の他の話者が存在する場合には、ステップＳ２５で今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）が目標音量レベルから閾値を減算した値（目標音量レベル−閾値）となるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 If the SN ratio of the other speaker is higher than the SN ratio of the speaker of the own device in step S23, it is determined in step S24 whether or not there is another speaker whose SN ratio is greater than or equal to the threshold. If there is another speaker, the gain s_gain of the current frame such that the average noise level N_ave (n) of the current frame becomes a value obtained by subtracting the threshold value from the target sound volume level (target sound volume level−threshold) in step S25. Is calculated.

ステップＳ２４で閾値以上の他の話者が存在しない場合には、ステップＳ２６で今回フレームの平均雑音レベルＮ＿ａｖｅ（ｎ）が目標音量レベルからＳＮ比が最も高い他の話者のＳＮ比を減算した値（目標音量レベル−ＳＮ比が最大の話者のＳＮ比）となるような今回フレームのゲインｓ＿ｇａｉｎを算出する。 If there is no other speaker exceeding the threshold value in step S24, in step S26, the average noise level N_ave (n) of the current frame is obtained by subtracting the SN ratio of the other speaker having the highest SN ratio from the target volume level. The gain s_gain of the current frame is calculated so as to be a value (target volume level-SN ratio of the speaker having the maximum SN ratio).

つまり、全ての話者のＳＮ比が閾値未満のときはＳＮ比が最も高い話者のＳＮ比を閾値とみなして、全ての自動音量制御装置のＳＮ比を最も高い話者のＳＮ比に合わせるようにしている。 That is, when the S / N ratio of all the speakers is less than the threshold, the S / N ratio of the speaker having the highest S / N ratio is regarded as the threshold, and the S / N ratios of all automatic volume control devices are matched to the S / N ratio of the highest speaker. I am doing so.

このように、ＳＮ比が良い話者に対しては音声レベルが目標音量レベルとなるようにゲインを決定し、ＳＮ比が悪い話者に対しては雑音レベルが目標音量レベルからＳＮ比が最も高い話者のＳＮ比を減算した値より大きくならないようにゲインを決定することで、ＳＮ比が良好な話者について一定以上のＳＮ比を確保でき、ＳＮ比が悪い話者の影響によって他の話者の音声が聞き取りにくくなる問題を生じず、良好な通話が可能となる。 As described above, the gain is determined so that the voice level becomes the target volume level for the speaker having a good S / N ratio, and the noise level is the highest from the target volume level to the speaker for the speaker having a poor S / N ratio. By determining the gain so as not to be larger than the value obtained by subtracting the SN ratio of a high speaker, an SN ratio of a certain level or more can be ensured for a speaker having a good SN ratio, and other factors are affected by the influence of the speaker having a poor SN ratio. A good call can be made without causing a problem that it is difficult to hear the voice of the speaker.

従来の自動音量制御装置の一例のブロック図である。It is a block diagram of an example of the conventional automatic volume control apparatus. 従来の音声通信装置の受話側部分の一例のブロック図である。It is a block diagram of an example of the receiving side part of the conventional voice communication apparatus. 従来の音声通信装置の問題を説明するための図である。It is a figure for demonstrating the problem of the conventional voice communication apparatus. 自動音量制御装置の第１実施形態の構成例を示す図である。It is a figure which shows the structural example of 1st Embodiment of an automatic volume control apparatus. 増幅率決定部１５の一実施形態の構成例を示す図である。3 is a diagram illustrating a configuration example of an embodiment of an amplification factor determination unit 15. FIG. 増幅率算出部２２が実行する処理の一例のフローチャートである。5 is a flowchart of an example of processing executed by an amplification factor calculation unit 22; 多地点音声通信システムにおける音声通信装置の受話側部分の第１実施形態のブロック図である。It is a block diagram of 1st Embodiment of the receiving side part of the audio | voice communication apparatus in a multipoint audio | voice communication system. 第１実施形態の自動音量制御の様子を説明するための図である。It is a figure for demonstrating the mode of automatic volume control of 1st Embodiment. 従来の自動音量制御の様子を説明するための図である。It is a figure for demonstrating the mode of the conventional automatic volume control. 自動音量制御装置の第２実施形態の構成例を示す図である。It is a figure which shows the structural example of 2nd Embodiment of an automatic volume control apparatus. 増幅率決定部３５の一実施形態の構成例を示す図である。4 is a diagram illustrating a configuration example of an embodiment of an amplification factor determination unit 35. FIG. 増幅率算出部３８が実行する処理の一例のフローチャートである。5 is a flowchart illustrating an example of processing executed by an amplification factor calculation unit. 多地点音声通信システムにおける音声通信装置の受話側部分の第２実施形態のブロック図である。It is a block diagram of 2nd Embodiment of the receiving side part of the audio | voice communication apparatus in a multipoint audio | voice communication system. 第２実施形態の自動音量制御の様子を説明するための図である。It is a figure for demonstrating the mode of automatic volume control of 2nd Embodiment. 増幅率算出部３８が実行する処理の他の例のフローチャートである。10 is a flowchart of another example of processing executed by an amplification factor calculation unit 38.

Explanation of symbols

１２ＶＡＤ部
１３音声レベル更新部
１４雑音レベル更新部
１５，３５増幅率決定部
１７ゲイン乗算部
２１ＳＮ比算出部
２２，３８増幅率算出部
２３−１〜２３−ｎ，４３−１〜４３−ｎＡＧＣ部
２４ミキシング部
３７ＳＮ比比較部
１２サービスエリア 12 VAD unit 13 Audio level update unit 14 Noise level update unit 15, 35 Gain determination unit 17 Gain multiplication unit 21 SN ratio calculation unit 22, 38 Gain calculation unit 23-1 to 23-n, 43-1 to 43- n AGC unit 24 mixing unit 37 SN ratio comparison unit 12 service area

Claims

Voice determination means for determining a voice portion and a non-voice portion of an input voice signal;
A sound level calculating means for calculating a sound level in a sound portion of the input sound signal;
Noise level calculation means for calculating a noise level in a non-voice portion of the input voice signal;
SN ratio calculating means for calculating an SN ratio of the input voice signal from the voice level and the noise level;
Amplification factor calculating means for calculating an amplification factor of the input audio signal from the audio level, the noise level, the SN ratio, and a preset target volume level;
Amplifying means for amplifying and outputting the input audio signal at the amplification factor;
I have a,
The amplification factor calculating means calculates an amplification factor so that the sound level becomes the target sound volume level when the SN ratio is equal to or greater than a threshold value, and the noise level is calculated from the target sound volume level when the SN ratio is less than the threshold value. The amplification factor is calculated so as to be a value lower by the threshold value.
An automatic volume control device characterized by that.

Voice determination means for determining a voice portion and a non-voice portion of an input voice signal;
A sound level calculating means for calculating a sound level in a sound portion of the input sound signal;
Noise level calculation means for calculating a noise level in a non-voice portion of the input voice signal;
SN ratio calculating means for calculating an SN ratio of the input voice signal from the voice level and the noise level;
The input audio signal is calculated from the audio level, the noise level, the SN ratio calculated by the SN ratio calculating means, the preset target volume level, and the SN ratio supplied from one or more other automatic volume control devices. An amplification factor calculating means for calculating the amplification factor;
Amplifying means for amplifying and outputting the input audio signal at the amplification factor;
I have a,
The amplification factor calculating means is configured such that when the SN ratio calculated by the SN ratio calculating means is equal to or higher than the SN ratio supplied from the one or more other automatic sound volume control devices, the sound level becomes the target sound volume level. When the S / N ratio calculated by the S / N ratio calculating means is less than the S / N ratio supplied from the one or more other automatic volume control devices, the noise level is the highest S / N ratio from the target volume level. Calculate the amplification factor so that the value is as low as possible,
An automatic volume control device characterized by that.

The automatic volume control device according to claim 2 ,
The amplification factor calculating means is configured such that when the SN ratio calculated by the SN ratio calculating means is greater than or equal to a threshold value or greater than or equal to an SN ratio supplied from the one or more other automatic volume control devices, the audio level is the target volume. An amplification factor is calculated so as to be a level, and the SN ratio calculated by the SN ratio calculation means is not more than a threshold value, and any of the SN ratios supplied from the one or more other automatic volume control devices is not less than the threshold value. When the amplification factor is calculated so that the noise level is lower than the target volume level by the threshold value, the SN ratio calculated by the SN ratio calculation unit is less than the threshold value and the one or more other automatic volume control devices The noise level is the highest SN ratio among the SN ratios supplied from the target volume level from the one or more other automatic volume control devices when all the S / N ratios supplied from The amplification factor is calculated as a lower value,
An automatic volume control device characterized by that.

The automatic volume control device according to any one of claims 1 to 3 ,
The sound level calculating means smoothes and outputs the sound level calculated in the current input sound signal frame with the sound level obtained in the previous input sound signal frame,
An automatic volume control device characterized by that.

The automatic volume control device according to any one of claims 1 to 4 ,
The noise level calculation means smoothes and outputs the noise level calculated in the current input voice signal frame with the noise level obtained in the previous input voice signal frame,
An automatic volume control device characterized by that.

The automatic volume control device according to any one of claims 1 to 5 ,
The amplification factor calculating means smoothes and outputs the amplification factor calculated in the current input audio signal frame by the amplification factor obtained in the previous input audio signal frame,
An automatic volume control device characterized by that.

A plurality of automatic volume control devices according to any one of claims 1 to 6 ,
Mixing means for mixing audio signals output from the plurality of automatic volume control devices;
A voice communication device comprising: