JP2010050512A

JP2010050512A - Voice mixing device, and program

Info

Publication number: JP2010050512A
Application number: JP2008210436A
Authority: JP
Inventors: Shun Harada; 瞬原田; Takushi Murakami; 卓志村上
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2008-08-19
Filing date: 2008-08-19
Publication date: 2010-03-04

Abstract

PROBLEM TO BE SOLVED: To provide a voice mixing device capable of catching voices of all uttering persons when having a voice conference in an environment with much background noise. SOLUTION: The voice mixing device includes: a voice information receiving means for receiving a plurality of pieces of voice information from the outside; an existent voice head part existence/non-existence detecting means for detecting the existence/non-existence of an existent voice head part in outputs in each voice information of the voice information receiving means; a number of voices detecting means for detecting the number of voices which are equal to or more than an existent voice threshold in the outputs of the each voice information of the voice information receiving means; a gain control means for raising only the volume of existent voice information showing the existence of the existent voice head part in the outputs of the existent voice head part existence/non-existence detecting means and not raising the volume of the other voice information when the output of the number of voices detecting means is less than a threshold of the number of valid voices; a voice information combining means 124 for adding the outputs in the each voice information of the gain control means; and voice information transmitting means 101 and 102 for transmitting an output of the voice information combining means to the outside. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、複数の音声情報が入力される音声通話システムに使用される音声ミキシング装置に関し、特に背景ノイズの多い飛行機内で客室乗務員等が各担当持場に居ながらマイクとスピーカから成るハンドセットを用いて会議を行なう場合の音声統括サーバ内の音声ミキシング装置に関する。 The present invention relates to a voice mixing device used in a voice call system in which a plurality of voice information is input, and in particular, a handset composed of a microphone and a speaker while a flight attendant or the like stays in each assigned place in an airplane with a lot of background noise. The present invention relates to an audio mixing device in an audio control server when a conference is performed using the same.

以下に従来の音声ミキシング装置について説明する。 A conventional audio mixing apparatus will be described below.

従来、音声ミキシング装置は特許文献１に記載されたものが知られている。図９は特許文献１に記載された音声ミキシング装置の構成を示すブロック図である。図９において、１〜３は音声情報送受信部で４〜６は信号レベル検出部、７は優先選択部、８は制御部、９は音声情報合成部である。 Conventionally, an audio mixing device described in Patent Document 1 is known. FIG. 9 is a block diagram showing the configuration of the audio mixing device described in Patent Document 1. In FIG. In FIG. 9, 1-3 are voice information transmission / reception units, 4-6 are signal level detection units, 7 is a priority selection unit, 8 is a control unit, and 9 is a voice information synthesis unit.

以上のように構成された従来の音声ミキシング装置について、以下その動作について説明する。まず、ネットワークを介した音声情報は音声情報送受信部１〜３で受信され、信号レベル検出部４〜６及び優先選択部９へ伝達される。５地点以上の会議室からの音声を無条件に合成した場合、各会議室からのエコーが重畳され、合成された音声が聞きとりにくくなる問題があり、運用上・技術上から４地点程度までの合成が限度であることから、信号レベル検出部４〜６から音声の有無及び音声の有音部分を認識した時刻を受信した制御部８は、音声の有音部分が検出された回線数が予め設定されたＮ（Ｎは正の整数）回線より少ないかどうかをチェックし、少ない場合には優先選択部９を制御し、音声の有音部分が検出された回線のみを音声情報合成部７に接続する。また、多い場合にはこれら回線で音声の有音部分が認識された時刻を時系列的にチェックし、音声の発生した順序に従い早いものからＮ回線を選択し、優先選択部９を制御し、選択されたＮ回線を音声情報合成部７へ接続する。
特開平４−８４５５３号公報 The operation of the conventional audio mixing apparatus configured as described above will be described below. First, audio information via the network is received by the audio information transmission / reception units 1 to 3 and transmitted to the signal level detection units 4 to 6 and the priority selection unit 9. When unconditionally synthesizing voices from five or more conference rooms, there is a problem that echoes from each conference room are superimposed, making it difficult to hear the synthesized voice. Therefore, the control unit 8 that has received the presence / absence of voice and the time when the voiced voice part is recognized from the signal level detectors 4 to 6 determines the number of lines in which the voiced voiced part is detected. It is checked whether or not the number is smaller than a preset N (N is a positive integer) line. If the number is smaller, the priority selection unit 9 is controlled, and only the line in which a voiced voice portion is detected is selected as the voice information synthesis unit 7. Connect to. In addition, when there are many, the time when the voiced portion of the voice is recognized on these lines is checked in time series, N lines are selected from the earliest according to the order in which the voices are generated, and the priority selection unit 9 is controlled, The selected N line is connected to the voice information synthesis unit 7.
JP-A-4-84553

しかしながら上記の従来の構成では、有音部分が認識された時刻を時系列的にチェックし、音声の発生した順序に従い早いものから選択するので、場合によっては重要な発言をしている人の音声を破棄してしまうという問題点があった。 However, in the above conventional configuration, the time when the voiced portion is recognized is checked in time series, and the earliest one is selected according to the order in which the voices are generated. There was a problem of destroying.

また、背景ノイズが多い場合は聞き取りにくいという問題点があった。 In addition, there is a problem that it is difficult to hear when there is a lot of background noise.

本発明は上記従来の問題点を解決するもので、背景ノイズの多い環境で音声会議を使用する際に、全ての発話者の音声を聞き取ることができる音声ミキシング装置を提供することを目的とする。 The present invention solves the above-described conventional problems, and an object thereof is to provide an audio mixing apparatus that can hear the voices of all speakers when using an audio conference in an environment with a lot of background noise. .

この目的を達成するために本発明の音声ミキシング装置は、外部からの複数の音声情報の受信を行なう音声情報受信手段と、前記音声情報受信手段の音声情報毎の出力に有音先頭部分の有無を検出する有音先頭部分有無検出手段と、前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げないゲイン制御手段と、前記ゲイン制御手段の音声情報毎の出力を加算する音声情報合成手段と、前記音声情報合成手段の出力を外部に送信する音声情報送信手段と、を備えた構成を有している。 In order to achieve this object, the audio mixing device of the present invention includes audio information receiving means for receiving a plurality of audio information from the outside, and whether or not there is a sound head portion in the output for each audio information of the audio information receiving means. The voiced head part presence / absence detecting means for detecting the voiced voice signal and the voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means is increased, and the volume of the other voice information is not increased. A gain control unit; a voice information synthesis unit that adds outputs for each voice information of the gain control unit; and a voice information transmission unit that transmits the output of the voice information synthesis unit to the outside. Yes.

また、前記音声情報受信手段の音声情報毎の出力の中で有音閾値以上の音声の数を検出する音声数検出手段を更に有し、ゲイン制御手段は、前記音声数検出手段の出力が有効音声数閾値未満の場合であって前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げない構成を有している。 Further, the voice information receiving means further includes a voice number detecting means for detecting the number of voices that are equal to or higher than a sound threshold value in the output for each voice information, and the gain control means is effective when the output of the voice number detecting means is effective. When the number of voices is less than the threshold value, only the volume of voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means is raised, and the volume of other voice information is not raised. is doing.

以上のように本発明は、背景ノイズの多い環境で音声会議を使用する際に、全ての発話者の音声を聞き取ることができるという優れた効果が得られる。 As described above, according to the present invention, when using a voice conference in an environment with a lot of background noise, an excellent effect is obtained that the voices of all the speakers can be heard.

以下、本発明の実施の形態について、図を用いて説明する。
（実施の形態１）
図１は本発明の実施の形態１における音声ミキシング装置の構成を示すブロック図である。入力音声数が少ない場合には、背景ノイズが多い場合であっても会話の有音先頭部分のゲインを上げると聞き取り易い。しかし、入力音声数が多い場合には、会話の先頭音のゲインを上げると、会話の途中から参加する話者の音声が、継続中の会話の障害になり聞き取り難い。そこで、以下の構成を有する。１００はハンドセット等の音声端末との間で音声情報を伝達するネットワーク、１０１〜１０２はネットワーク１００との間で音声情報を送受信する第１の音声情報送受信部〜第Ｍ（Ｍは正整数）の音声情報送受信部、１０３〜１０４は第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２の受信出力から有音閾値１２０以上の音声レベル（以下、「有音レベル」という）の有無を検出する第１の信号レベル検出部〜第Ｍの信号レベル検出部、１０５〜１０６は第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４の出力において有音閾値１２０以上の音声レベルを有する先頭部分である有音先頭部分の有無検出を行なう第１の有音先頭部分有無検出部〜第Ｍの有音先頭部分有無検出部、１２１は第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４の出力の中で有音閾値１２０以上の音声の数（以下、「有効音声数」という）を検出する音声数検出部、１２３は有効音声数が有効音声数閾値１２２未満か否かを判断するゲイン変更判断部、１０７〜１０８は第１の有音先頭部分有無検出部１０５〜第Ｍの有音先頭部分有無検出部１０６の出力についてゲイン変更判断部１２３の出力がゲート信号として制御する第１のゲート部〜第Ｍのゲート部、１０９〜１１０は第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２の受信出力を第１のゲート部１０７〜第Ｍのゲート部１０８の出力に基いてゲイン制御する第１のゲイン制御部〜第Ｍのゲイン制御部、１２４は第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０の出力を加算する音声情報合成部である。音声情報合成部１２４の出力は第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２に供給されネットワーク１００を介して音声端末に送られる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an audio mixing apparatus according to Embodiment 1 of the present invention. When the number of input voices is small, even if the background noise is large, it is easy to hear if the gain at the beginning of the speech is increased. However, when the number of input voices is large, if the gain of the first sound of the conversation is increased, the voice of the speaker who joins from the middle of the conversation becomes an obstacle to the ongoing conversation and is difficult to hear. Therefore, it has the following configuration. Reference numeral 100 denotes a network that transmits voice information to and from a voice terminal such as a handset, and 101 to 102 denote a first voice information transmission / reception unit that transmits and receives voice information to and from the network 100 to Mth (M is a positive integer). The voice information transmission / reception units 103 to 104 have presence / absence of a voice level (hereinafter referred to as “sound level”) equal to or higher than the voice threshold 120 from the reception output of the first voice information transmission / reception unit 101 to the Mth voice information transmission / reception unit 102. The first signal level detection unit to M-th signal level detection unit 105 to 106 detect voices having a sound threshold value of 120 or more at the outputs of the first signal level detection unit 103 to M-th signal level detection unit 104. A first sounded head part presence / absence detecting unit for detecting the presence / absence of a sounded head part which is a head part having a level to an Mth sounded head part presence / absence detecting unit, 121 is a first signal level detection A voice number detection unit for detecting the number of voices having a voice threshold value of 120 or more (hereinafter referred to as “valid voice number”) among the outputs of the 103rd to Mth signal level detection units 104; Gain change determination units 107 and 108 for determining whether the number is less than the threshold value 122. The gain change determination unit 123 outputs the outputs of the first sound head portion presence / absence detection unit 105 to the Mth sound head portion presence / absence detection unit 106. The first gate unit to the M-th gate unit whose outputs are controlled as gate signals, and 109 to 110 are the reception outputs of the first audio information transmitting / receiving unit 101 to the M-th audio information transmitting / receiving unit 102 as the first gate unit. A first gain control unit to an Mth gain control unit that performs gain control on the basis of the outputs of the 107th to Mth gate units 108, and 124 are outputs of the first gain control unit 109 to the Mth gain control unit 110, respectively. to add A voice information combining unit. The output of the voice information synthesis unit 124 is supplied to the first voice information transmission / reception unit 101 to the Mth voice information transmission / reception unit 102 and sent to the voice terminal via the network 100.

次に本発明の実施の形態１における音声ミキシング装置の動作について図２〜８を用いて説明する。図２は本発明の実施の形態１における第ｍ（ｍは１〜Ｍの中の任意の整数）の信号レベル検出部の出力の模式図である。第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４の一構成要素である第ｍの信号レベル検出部では、まず時刻ｔ（ｎ）における有音部分の信号レベルであるアナログ有音レベル（Ｖ（ｎ）（ｍ））を算出し、次にアナログ有音レベルと有音閾値１２０との大小比較を行い２値化有音レベル（ＢＶ（ｎ）（ｍ））（有音閾値以上の場合は１、有音閾値未満の場合は０）を算出し出力する。ここで無音、無声音の場合はレベルの値は０となる。 Next, the operation of the audio mixing apparatus according to Embodiment 1 of the present invention will be described with reference to FIGS. FIG. 2 is a schematic diagram of the output of the m-th (m is an arbitrary integer from 1 to M) signal level detector in the first embodiment of the present invention. In the m-th signal level detection unit, which is a component of the first signal level detection unit 103 to the M-th signal level detection unit 104, first, analog sound that is the signal level of the sound part at time t (n). The level (V (n) (m)) is calculated, and then the analog sound level and the sound threshold 120 are compared, and the binarized sound level (BV (n) (m)) (sound threshold) 1 is calculated in the above case, and 0) is calculated and output if it is less than the sound threshold. Here, in the case of silence and unvoiced sound, the level value is zero.

図３は本発明の実施の形態１における第ｍの有音先頭部分有無検出部の出力の模式図である。第１の有音先頭部分有無検出部１０５〜第Ｍの有音先頭部分有無検出部１０６の一構成要素である第ｍの有音先頭部分有無検出部では第ｍの信号レベル検出部の出力から有音先頭部分の有無検出を行なう。有音先頭部分の有無検出の仕方としては例えば、現時刻での値ＢＶ（ｎ）（ｍ）−前時刻での値ＢＶ（ｎ−１）（ｍ）が１の場合にのみ有音先頭部分を検出したとすることができ、０または−１の場合は有音先頭部分を検出しなかったとすることができる。有音先頭部分を検出した場合はゲインアップ、そうでない場合はゲイン不変とする信号を出力する。 FIG. 3 is a schematic diagram of the output of the m-th sound head portion presence / absence detection unit according to Embodiment 1 of the present invention. From the output of the m-th signal level detection unit, the m-th sound head part presence / absence detection unit 105, which is a component of the first sound head part presence / absence detection unit 105 to the M-th sound head part presence / absence detection unit 106, The presence / absence detection of the head part of the sound is detected. As a method of detecting the presence / absence of a voiced head part, for example, the voiced head part is only when the value BV (n) (m) at the current time−the value BV (n−1) (m) at the previous time is 1. Can be assumed to be detected, and in the case of 0 or −1, it can be assumed that the head portion of sound is not detected. If the head part of sound is detected, a gain is increased, and if not, a signal that does not change the gain is output.

図４は本発明の実施の形態１における音声数検出部１２１の出力の模式図である。音声数検出部１２１では、第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４の各出力である２値化有音レベルＢＶ（ｎ）（ｍ）の値を加算した値を有効音声数として検出する。時刻ｔ（ｎ）における有効音声数をＶＰ（ｎ）で表す。 FIG. 4 is a schematic diagram of the output of the voice number detection unit 121 according to Embodiment 1 of the present invention. In the sound number detection unit 121, a value obtained by adding the values of the binarized sound level BV (n) (m), which are the outputs of the first signal level detection unit 103 to the Mth signal level detection unit 104, is valid. Detect as voice number. The number of effective voices at time t (n) is represented by VP (n).

図５は本発明の実施の形態１におけるゲイン変更判断部の出力の模式図である。ゲイン変更判断部１２３では、有効音声数が有効音声数閾値１２２未満か否かを大小比較により判断する。有効音声数が有効音声数閾値以上の場合はゲイン変更ＯＦＦ、有効音声数閾値未満の場合はゲイン変更ＯＮとして出力する。 FIG. 5 is a schematic diagram of the output of the gain change determination unit in Embodiment 1 of the present invention. The gain change determination unit 123 determines whether or not the number of effective sounds is less than the effective sound number threshold 122 by size comparison. When the number of valid voices is equal to or greater than the valid voice number threshold, the gain change is OFF, and when the number is less than the valid voice number threshold, the gain change is output as ON.

図６は本発明の実施の形態１における第ｍのゲート部の出力の模式図である。第１のゲート部１０７〜第Ｍのゲート部１０８の一構成要素である第ｍのゲート部では、第ｍの有音先頭部分有無検出部の出力をゲイン変更判断部１２３の出力により制御する。例えば第ｍの有音先頭部分有無検出部がゲインアップを示している場合に、ゲイン変更判断部１２３の出力がゲイン変更ＯＮのときはゲインアップを出力するが、ゲイン変更ＯＦＦのときはゲイン不変を出力する。また、第ｍの有音先頭部分有無検出部がゲイン不変を示している場合には、ゲイン変更判断部１２３の出力がゲイン変更ＯＮおよびゲイン変更ＯＦＦの何れのときもゲイン不変を出力する。第１のゲート部１０７〜第Ｍのゲート部１０８の出力は各々第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０のゲインを制御する。ゲインアップの場合は音量を上げる。ゲイン不変の場合は音量を変更しない。 FIG. 6 is a schematic diagram of the output of the mth gate portion in the first embodiment of the present invention. In the m-th gate unit, which is one component of the first gate unit 107 to the M-th gate unit 108, the output of the m-th sound leading portion presence / absence detection unit is controlled by the output of the gain change determination unit 123. For example, when the m-th sound head portion presence / absence detection unit indicates gain up, the gain change determination unit 123 outputs gain up when the gain change is ON, but gain invariant when the gain change is OFF. Is output. When the m-th sound head portion presence / absence detection unit indicates that the gain is unchanged, the gain unchanged is output when the output of the gain change determination unit 123 is either gain change ON or gain change OFF. The outputs of the first gate unit 107 to the Mth gate unit 108 control the gains of the first gain control unit 109 to the Mth gain control unit 110, respectively. Increase the volume for gain up. If the gain remains unchanged, the volume is not changed.

図７は本発明の実施の形態１における音声ミキシング装置のゲイン制御のフローチャートである。Ｓ７０１では、ネットワーク１００を介した音声情報は第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２で受信され時刻ｔ（ｎ）における第１の音声情報〜第Ｍの音声情報が入力される。Ｓ７０２では、第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０全てのゲインをリセットする。Ｓ７０３では、第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４において各々が対応する第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２の出力の有音レベルを検出する。Ｓ７０４では、第１の有音先頭部分有無検出部１０５〜第Ｍの有音先頭部分有無検出部１０６において各々が対応する第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４の出力から有音先頭部分の有無検出を行なう。Ｓ７０５では、音声数検出部１２１において有効音声数を検出する。Ｓ７０６では、ゲイン変更判断部１２３において有効音声数が有効音声数閾値未満か否かを判断する。有効音声数閾値以上の場合はＳ７０７に進み、有効音声数閾値未満の場合はＳ７０８に進む。Ｓ７０７では、第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０についてゲイン不変とする。Ｓ７０８では、有音先頭部分有りを検出した該当するゲイン制御部についてのみゲインアップし他のゲイン制御部についてゲイン不変とする。Ｓ７０９では、第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０の出力を音声情報合成部１２４に入力し音声情報合成する。音声情報合成部１２４の出力は第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２に入力され、ネットワーク１００を介して各発話者に送られる。Ｓ７１０では、時刻ｔ（ｎ）における処理を終了し時刻ｔ（ｎ＋１）における処理のために待機する。 FIG. 7 is a flowchart of gain control of the audio mixing apparatus according to Embodiment 1 of the present invention. In S701, the voice information via the network 100 is received by the first voice information transmitting / receiving unit 101 to the Mth voice information transmitting / receiving unit 102, and the first voice information to the Mth voice information at time t (n) are input. Is done. In S702, the gains of all of the first gain control unit 109 to the Mth gain control unit 110 are reset. In step S 703, the first signal level detection unit 103 to the M-th signal level detection unit 104 detect the sound levels of the outputs from the first audio information transmission / reception unit 101 to the M-th audio information transmission / reception unit 102. To do. In S 704, the outputs of the first signal level detection unit 103 to the M-th signal level detection unit 104 respectively correspond to the first sounded head part presence / absence detection unit 105 to the M-th sounding head part presence / absence detection unit 106. The presence / absence of the head part of the sound is detected. In step S 705, the number-of-voices detection unit 121 detects the number of valid voices. In step S706, the gain change determination unit 123 determines whether the number of effective sounds is less than the effective sound number threshold. If it is greater than or equal to the valid voice number threshold, the process proceeds to S707, and if it is less than the valid voice number threshold, the process proceeds to S708. In S707, the gains of the first gain control unit 109 to the Mth gain control unit 110 are not changed. In S708, the gain is increased only for the corresponding gain control unit that has detected the presence of the leading part of the sound, and the gain is unchanged for the other gain control units. In S709, the outputs of the first gain control unit 109 to the Mth gain control unit 110 are input to the voice information synthesis unit 124 to synthesize voice information. The output of the voice information synthesis unit 124 is input to the first voice information transmission / reception unit 101 to the Mth voice information transmission / reception unit 102 and is sent to each speaker via the network 100. In S710, the process at time t (n) is ended and the process waits for the process at time t (n + 1).

図８は本発明の実施の形態１における音声ミキシング装置の動作説明のための音声の状態図である。話者Ａ〜Ｅが会話に参加しており、図示のタイミングで会話がなされているとする。有効音声数閾値＝３とする。時刻ｔ（ａ）で話者Ｂの有音先頭部分が検出された際、有効音声数を算出する。話者Ａが発話中であるので、Ｂを含めた有効音声数は２となる。有効音声数閾値と有効音声数を比較し、閾値未満となるため、話者Ｂの音声パケットはゲインアップされ、音声情報合成される。時刻ｔ（ｂ）で話者Ｅの有音先頭部分が検出された場合は、ＡとＢが発話中であるので、有効音声数は３となる。有効音声数閾値以上となり話者Ｅの音声パケットはゲイン不変のまま音声情報合成される。時刻ｔ（ｃ）でも同様で、有効音声数は３となる。有効音声数閾値以上となり話者Ａの音声パケットはゲイン不変のまま音声情報合成される。時刻ｔ（ｄ）で話者Ｄの有音先頭部分が検出された場合は、有効音声数が２なので話者Ｄの音声パケットはゲインアップされ音声情報合成される。 FIG. 8 is a sound state diagram for explaining the operation of the sound mixing apparatus according to the first embodiment of the present invention. It is assumed that the speakers A to E participate in the conversation and the conversation is made at the timing shown in the figure. It is assumed that the effective voice number threshold = 3. When the voiced head portion of speaker B is detected at time t (a), the number of effective voices is calculated. Since speaker A is speaking, the number of effective voices including B is two. Since the effective voice number threshold value and the effective voice number are compared and become less than the threshold value, the voice packet of the speaker B is gained up, and voice information is synthesized. When the voiced head portion of the speaker E is detected at time t (b), A and B are speaking, so the number of effective voices is 3. The voice packet of the speaker E is synthesized with the voice information synthesized with the gain unchanged. The same is true at time t (c), and the number of effective voices is three. The voice information of the speaker A is synthesized with the gain unchanged and the voice packet of the speaker A becomes equal to or greater than the effective voice threshold. When the voiced head portion of speaker D is detected at time t (d), the number of effective voices is 2, so that the voice packet of speaker D is increased in gain and voice information is synthesized.

以上のように本実施の形態１によれば、背景ノイズの多い環境で少人数で音声会議を使用する際には、聞き取りにくいとされる会話の先頭音のゲインアップを行うので聞き返しなどが少なくなり、効率よく会話を進めることができる。一方、多人数で音声会議を使用する際には割り込み音声の会話の先頭音のゲインアップを停止するため、会話の途中から参加する話者の音声が、継続中の会話の障害とならない。これにより、ユーザは音量調整など会議の妨げとなる操作をすることなく音声会議を使用することができる。 As described above, according to the first embodiment, when a voice conference is used with a small number of people in an environment with a lot of background noise, the gain of the first tone of the conversation that is difficult to hear is increased, so there is little recollection. It is possible to advance the conversation efficiently. On the other hand, when using a voice conference with a large number of people, the gain of the leading sound of the interrupted voice conversation is stopped, so that the voice of the speaker who joins from the middle of the conversation does not become an obstacle to the ongoing conversation. Thus, the user can use the audio conference without performing an operation that hinders the conference such as volume adjustment.

尚、以上の説明ではＭ個の音声情報送受信部、信号レベル検出部、有音先頭部分有無検出部、ゲート部、ゲイン制御部を前提とした。しかし、時分割処理を行う場合は各々について１個備えればよい。 In the above description, M voice information transmitting / receiving units, a signal level detection unit, a voiced head portion presence / absence detection unit, a gate unit, and a gain control unit are assumed. However, when performing time-sharing processing, it is sufficient to provide one for each.

また、上記説明した手段を機能させるためのプログラムを用いてソフトウェア処理とすることも可能である。 It is also possible to perform software processing using a program for causing the above-described means to function.

さらに、以上の説明では第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４の入力を第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２からの受信入力としたが、第１の信号レベル検出部１０３〜第Ｍの信号レベル検出部１０４への入力は、第１の音声情報送受信部１０１〜第Ｍの音声情報送受信部１０２からの受信入力にＡＧＣを介して参加者の音量レベルを調整したものとしてもよい。この場合はＡＧＣにより、マイクと話者との距離が異なることなどが原因となる参加者の音量レベルの差異を取り除くことにより、より快適な通話環境となる。
（実施の形態２）
実施の形態１においては参加人数の多少に拘わらず適用できる音声ミキシング装置について説明した。しかし、参加人数が有効音声数未満であることが定まっている場合は有効音声数の大小を考慮する必要がないため以下の様にすることができる。 Further, in the above description, the inputs from the first signal level detection unit 103 to the Mth signal level detection unit 104 are the reception inputs from the first audio information transmission / reception unit 101 to the Mth audio information transmission / reception unit 102. The inputs to the first signal level detection unit 103 to the Mth signal level detection unit 104 participate in the reception inputs from the first audio information transmission / reception unit 101 to the Mth audio information transmission / reception unit 102 via the AGC. The volume level of the person may be adjusted. In this case, a more comfortable call environment can be obtained by removing the difference in the volume level of the participant caused by the distance between the microphone and the speaker being different by AGC.
(Embodiment 2)
In the first embodiment, an audio mixing apparatus that can be applied regardless of the number of participants has been described. However, when it is determined that the number of participants is less than the number of valid voices, it is not necessary to consider the size of the number of valid voices, so the following can be performed.

図９は本発明の実施の形態２における音声ミキシング装置の構成を示すブロック図である。図１と比較して省くことができる手段及び信号は、音声数検出部１２１、ゲイン変更判断部１２３、第１のゲート部１０７〜第Ｍのゲート部１０８及び有効音声数閾値１２２である。この場合は第１の有音先頭部分有無検出部１０５〜第Ｍの有音先頭部分有無検出部１０６の出力が有音先頭部分を検出した場合はゲインアップ、そうでない場合はゲイン不変とする信号が第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０に直接供給される。 FIG. 9 is a block diagram showing a configuration of an audio mixing apparatus according to Embodiment 2 of the present invention. The means and signals that can be omitted compared to FIG. 1 are the voice number detection unit 121, the gain change determination unit 123, the first gate unit 107 to the Mth gate unit 108, and the effective voice number threshold value 122. In this case, the first sounded head portion presence / absence detecting unit 105 to the Mth sounded head portion presence / absence detecting unit 106 detects that the sounded head portion is detected, and the gain is increased. Are directly supplied to the first gain control unit 109 to the Mth gain control unit 110.

図１０は本発明の実施の形態２における音声ミキシング装置のゲイン制御のフローチャートである。図７と比較して省くことができるステップは、音声数検出部１２１において有効音声数を検出すること（Ｓ７０５）、ゲイン変更判断部１２３において有効音声数が有効音声数閾値未満か否かを判断すること（Ｓ７０６）、第１のゲイン制御部１０９〜第Ｍのゲイン制御部１１０の全てについてゲイン不変とすること（Ｓ７０７）である。 FIG. 10 is a flowchart of gain control of the audio mixing apparatus according to the second embodiment of the present invention. Steps that can be omitted in comparison with FIG. 7 include detecting the number of effective sounds in the sound number detecting unit 121 (S705), and determining whether the number of effective sounds is less than the effective sound number threshold in the gain change determining unit 123. (S706) and making the gain invariant for all of the first gain control unit 109 to the Mth gain control unit 110 (S707).

以上のように本実施の形態２によれば、実施の形態１に比較して多人数で音声会議を使用する際に必要な手段等を省くことができ、これにより、ユーザは音量調整など会議の妨げとなる操作をすることなく音声会議を経済的に使用することができる。 As described above, according to the second embodiment, compared with the first embodiment, it is possible to omit means necessary when using the audio conference with a large number of people, and thus the user can adjust the volume such as volume control. Audio conferencing can be used economically without performing any operation that hinders the user.

尚、以上の説明ではＭ個の音声情報送受信部、信号レベル検出部、有音先頭部分有無検出部、ゲイン制御部を前提とした。しかし、時分割処理を行う場合は各々について１個備えればよい。 In the above description, it is assumed that there are M audio information transmitting / receiving units, a signal level detection unit, a voice head portion presence / absence detection unit, and a gain control unit. However, when performing time-sharing processing, it is sufficient to provide one for each.

本発明の音声ミキシング装置は、適切な音量ですべての発話者の音声を聞き取ることができるという優れた効果を有しているため、飛行機など背景ノイズが高い機中などにおけるインターコミュニケーションシステム等において有用である。 The voice mixing device of the present invention has an excellent effect of being able to hear the voices of all the speakers at an appropriate volume, and thus is useful in an intercommunication system or the like in a plane with high background noise such as an airplane. It is.

本発明の実施の形態１における音声ミキシング装置の構成を示すブロック図The block diagram which shows the structure of the audio | voice mixing apparatus in Embodiment 1 of this invention. 本発明の実施の形態１における第ｍの信号レベル検出部の出力の模式図Schematic diagram of the output of the m-th signal level detection unit in Embodiment 1 of the present invention 本発明の実施の形態１における第ｍの有音先頭部分有無検出部の出力の模式図Schematic diagram of output of m-th sound head portion presence / absence detection unit in Embodiment 1 of the present invention 本発明の実施の形態１における音声数検出部の出力の模式図Schematic diagram of the output of the voice number detection unit in Embodiment 1 of the present invention 本発明の実施の形態１におけるゲイン変更判断部の出力の模式図Schematic diagram of output of gain change determination unit in Embodiment 1 of the present invention 本発明の実施の形態１における第ｍのゲート部の出力の模式図Schematic diagram of the output of the m-th gate part in the first embodiment of the present invention 本発明の実施の形態１における音声ミキシング装置のゲイン制御のフローチャートFlowchart of gain control of audio mixing apparatus in embodiment 1 of the present invention 本発明の実施の形態１における音声ミキシング装置の動作説明のための音声の状態図Audio state diagram for explaining the operation of the audio mixing apparatus according to Embodiment 1 of the present invention. 本発明の実施の形態２における音声ミキシング装置の構成を示すブロック図The block diagram which shows the structure of the audio | voice mixing apparatus in Embodiment 2 of this invention. 本発明の実施の形態２における音声ミキシング装置のゲイン制御のフローチャートFlowchart of gain control of audio mixing apparatus in embodiment 2 of the present invention 従来の音声ミキシング装置の構成を示すブロック図Block diagram showing the configuration of a conventional audio mixing device

Explanation of symbols

１００ネットワーク
１０１〜１０２第１〜第Ｍの音声情報送受信部
１０３〜１０４第１〜第Ｍの信号レベル検出部
１０５〜１０６第１〜第Ｍの有音先頭部分有無検出部
１０７〜１０８第１〜第Ｍのゲート部
１０９〜１１０第１〜第Ｍのゲイン制御部
１２０有音閾値
１２１音声数検出部
１２２有効音声数閾値
１２３ゲイン変更判断部
１２４音声情報合成部 DESCRIPTION OF SYMBOLS 100 Network 101-102 1st-Mth audio | voice information transmission / reception part 103-104 1st-Mth signal level detection part 105-106 1st-Mth sound head part presence-and-absence detection part 107-108 1st-1st M-th gate unit 109 to 110 First to M-th gain control unit 120 Voice threshold 121 Voice number detection unit 122 Effective voice number threshold 123 Gain change determination unit 124 Voice information synthesis unit

Claims

Voice information receiving means for receiving a plurality of voice information from outside;
A voice head part presence / absence detecting means for detecting the presence or absence of a voice head part in the output of each voice information of the voice information receiving means;
Gain control means for raising only the volume of voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means, and not raising the volume of other voice information;
Voice information synthesis means for adding outputs for each voice information of the gain control means;
Voice information transmitting means for transmitting the output of the voice information synthesizing means to the outside;
An audio mixing device having

A voice number detecting means for detecting the number of voices equal to or higher than a voice threshold in the output of each voice information of the voice information receiving means;
The gain control means is only for the volume of the voice information indicating that there is a voice head part in the output of the voice head part presence / absence detection means when the output of the voice number detection means is less than an effective voice number threshold. The sound mixing apparatus according to claim 1, wherein the sound volume of other sound information is not raised.

Computer
Voice information receiving means for receiving a plurality of voice information from outside;
A voice head part presence / absence detecting means for detecting the presence or absence of a voice head part in the output of each voice information of the voice information receiving means;
Gain control means for raising only the volume of voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means, and not raising the volume of other voice information;
Voice information synthesis means for adding outputs for each voice information of the gain control means;
A program for functioning as voice information transmitting means for transmitting the output of the voice information synthesizing means to the outside.

A voice number detecting means for detecting the number of voices equal to or higher than a voice threshold in the output of each voice information of the voice information receiving means;
The gain control means is only for the volume of the voice information indicating that there is a voice head part in the output of the voice head part presence / absence detection means when the output of the voice number detection means is less than an effective voice number threshold. The program according to claim 3, wherein the volume of other audio information is not increased.