JP2010050512A - Voice mixing device, and program - Google Patents

Voice mixing device, and program Download PDF

Info

Publication number
JP2010050512A
JP2010050512A JP2008210436A JP2008210436A JP2010050512A JP 2010050512 A JP2010050512 A JP 2010050512A JP 2008210436 A JP2008210436 A JP 2008210436A JP 2008210436 A JP2008210436 A JP 2008210436A JP 2010050512 A JP2010050512 A JP 2010050512A
Authority
JP
Japan
Prior art keywords
voice
voice information
head part
output
detecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2008210436A
Other languages
Japanese (ja)
Inventor
Shun Harada
瞬 原田
Takushi Murakami
卓志 村上
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Corp
Original Assignee
Panasonic Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Corp filed Critical Panasonic Corp
Priority to JP2008210436A priority Critical patent/JP2010050512A/en
Publication of JP2010050512A publication Critical patent/JP2010050512A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice mixing device capable of catching voices of all uttering persons when having a voice conference in an environment with much background noise. <P>SOLUTION: The voice mixing device includes: a voice information receiving means for receiving a plurality of pieces of voice information from the outside; an existent voice head part existence/non-existence detecting means for detecting the existence/non-existence of an existent voice head part in outputs in each voice information of the voice information receiving means; a number of voices detecting means for detecting the number of voices which are equal to or more than an existent voice threshold in the outputs of the each voice information of the voice information receiving means; a gain control means for raising only the volume of existent voice information showing the existence of the existent voice head part in the outputs of the existent voice head part existence/non-existence detecting means and not raising the volume of the other voice information when the output of the number of voices detecting means is less than a threshold of the number of valid voices; a voice information combining means 124 for adding the outputs in the each voice information of the gain control means; and voice information transmitting means 101 and 102 for transmitting an output of the voice information combining means to the outside. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、複数の音声情報が入力される音声通話システムに使用される音声ミキシング装置に関し、特に背景ノイズの多い飛行機内で客室乗務員等が各担当持場に居ながらマイクとスピーカから成るハンドセットを用いて会議を行なう場合の音声統括サーバ内の音声ミキシング装置に関する。   The present invention relates to a voice mixing device used in a voice call system in which a plurality of voice information is input, and in particular, a handset composed of a microphone and a speaker while a flight attendant or the like stays in each assigned place in an airplane with a lot of background noise. The present invention relates to an audio mixing device in an audio control server when a conference is performed using the same.

以下に従来の音声ミキシング装置について説明する。   A conventional audio mixing apparatus will be described below.

従来、音声ミキシング装置は特許文献1に記載されたものが知られている。図9は特許文献1に記載された音声ミキシング装置の構成を示すブロック図である。図9において、1〜3は音声情報送受信部で4〜6は信号レベル検出部、7は優先選択部、8は制御部、9は音声情報合成部である。   Conventionally, an audio mixing device described in Patent Document 1 is known. FIG. 9 is a block diagram showing the configuration of the audio mixing device described in Patent Document 1. In FIG. In FIG. 9, 1-3 are voice information transmission / reception units, 4-6 are signal level detection units, 7 is a priority selection unit, 8 is a control unit, and 9 is a voice information synthesis unit.

以上のように構成された従来の音声ミキシング装置について、以下その動作について説明する。まず、ネットワークを介した音声情報は音声情報送受信部1〜3で受信され、信号レベル検出部4〜6及び優先選択部9へ伝達される。5地点以上の会議室からの音声を無条件に合成した場合、各会議室からのエコーが重畳され、合成された音声が聞きとりにくくなる問題があり、運用上・技術上から4地点程度までの合成が限度であることから、信号レベル検出部4〜6から音声の有無及び音声の有音部分を認識した時刻を受信した制御部8は、音声の有音部分が検出された回線数が予め設定されたN(Nは正の整数)回線より少ないかどうかをチェックし、少ない場合には優先選択部9を制御し、音声の有音部分が検出された回線のみを音声情報合成部7に接続する。また、多い場合にはこれら回線で音声の有音部分が認識された時刻を時系列的にチェックし、音声の発生した順序に従い早いものからN回線を選択し、優先選択部9を制御し、選択されたN回線を音声情報合成部7へ接続する。
特開平4−84553号公報
The operation of the conventional audio mixing apparatus configured as described above will be described below. First, audio information via the network is received by the audio information transmission / reception units 1 to 3 and transmitted to the signal level detection units 4 to 6 and the priority selection unit 9. When unconditionally synthesizing voices from five or more conference rooms, there is a problem that echoes from each conference room are superimposed, making it difficult to hear the synthesized voice. Therefore, the control unit 8 that has received the presence / absence of voice and the time when the voiced voice part is recognized from the signal level detectors 4 to 6 determines the number of lines in which the voiced voiced part is detected. It is checked whether or not the number is smaller than a preset N (N is a positive integer) line. If the number is smaller, the priority selection unit 9 is controlled, and only the line in which a voiced voice portion is detected is selected as the voice information synthesis unit 7. Connect to. In addition, when there are many, the time when the voiced portion of the voice is recognized on these lines is checked in time series, N lines are selected from the earliest according to the order in which the voices are generated, and the priority selection unit 9 is controlled, The selected N line is connected to the voice information synthesis unit 7.
JP-A-4-84553

しかしながら上記の従来の構成では、有音部分が認識された時刻を時系列的にチェックし、音声の発生した順序に従い早いものから選択するので、場合によっては重要な発言をしている人の音声を破棄してしまうという問題点があった。   However, in the above conventional configuration, the time when the voiced portion is recognized is checked in time series, and the earliest one is selected according to the order in which the voices are generated. There was a problem of destroying.

また、背景ノイズが多い場合は聞き取りにくいという問題点があった。   In addition, there is a problem that it is difficult to hear when there is a lot of background noise.

本発明は上記従来の問題点を解決するもので、背景ノイズの多い環境で音声会議を使用する際に、全ての発話者の音声を聞き取ることができる音声ミキシング装置を提供することを目的とする。   The present invention solves the above-described conventional problems, and an object thereof is to provide an audio mixing apparatus that can hear the voices of all speakers when using an audio conference in an environment with a lot of background noise. .

この目的を達成するために本発明の音声ミキシング装置は、外部からの複数の音声情報の受信を行なう音声情報受信手段と、前記音声情報受信手段の音声情報毎の出力に有音先頭部分の有無を検出する有音先頭部分有無検出手段と、前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げないゲイン制御手段と、前記ゲイン制御手段の音声情報毎の出力を加算する音声情報合成手段と、前記音声情報合成手段の出力を外部に送信する音声情報送信手段と、を備えた構成を有している。   In order to achieve this object, the audio mixing device of the present invention includes audio information receiving means for receiving a plurality of audio information from the outside, and whether or not there is a sound head portion in the output for each audio information of the audio information receiving means. The voiced head part presence / absence detecting means for detecting the voiced voice signal and the voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means is increased, and the volume of the other voice information is not increased. A gain control unit; a voice information synthesis unit that adds outputs for each voice information of the gain control unit; and a voice information transmission unit that transmits the output of the voice information synthesis unit to the outside. Yes.

また、前記音声情報受信手段の音声情報毎の出力の中で有音閾値以上の音声の数を検出する音声数検出手段を更に有し、ゲイン制御手段は、前記音声数検出手段の出力が有効音声数閾値未満の場合であって前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げない構成を有している。   Further, the voice information receiving means further includes a voice number detecting means for detecting the number of voices that are equal to or higher than a sound threshold value in the output for each voice information, and the gain control means is effective when the output of the voice number detecting means is effective. When the number of voices is less than the threshold value, only the volume of voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means is raised, and the volume of other voice information is not raised. is doing.

以上のように本発明は、背景ノイズの多い環境で音声会議を使用する際に、全ての発話者の音声を聞き取ることができるという優れた効果が得られる。   As described above, according to the present invention, when using a voice conference in an environment with a lot of background noise, an excellent effect is obtained that the voices of all the speakers can be heard.

以下、本発明の実施の形態について、図を用いて説明する。
(実施の形態1)
図1は本発明の実施の形態1における音声ミキシング装置の構成を示すブロック図である。入力音声数が少ない場合には、背景ノイズが多い場合であっても会話の有音先頭部分のゲインを上げると聞き取り易い。しかし、入力音声数が多い場合には、会話の先頭音のゲインを上げると、会話の途中から参加する話者の音声が、継続中の会話の障害になり聞き取り難い。そこで、以下の構成を有する。100はハンドセット等の音声端末との間で音声情報を伝達するネットワーク、101〜102はネットワーク100との間で音声情報を送受信する第1の音声情報送受信部〜第M(Mは正整数)の音声情報送受信部、103〜104は第1の音声情報送受信部101〜第Mの音声情報送受信部102の受信出力から有音閾値120以上の音声レベル(以下、「有音レベル」という)の有無を検出する第1の信号レベル検出部〜第Mの信号レベル検出部、105〜106は第1の信号レベル検出部103〜第Mの信号レベル検出部104の出力において有音閾値120以上の音声レベルを有する先頭部分である有音先頭部分の有無検出を行なう第1の有音先頭部分有無検出部〜第Mの有音先頭部分有無検出部、121は第1の信号レベル検出部103〜第Mの信号レベル検出部104の出力の中で有音閾値120以上の音声の数(以下、「有効音声数」という)を検出する音声数検出部、123は有効音声数が有効音声数閾値122未満か否かを判断するゲイン変更判断部、107〜108は第1の有音先頭部分有無検出部105〜第Mの有音先頭部分有無検出部106の出力についてゲイン変更判断部123の出力がゲート信号として制御する第1のゲート部〜第Mのゲート部、109〜110は第1の音声情報送受信部101〜第Mの音声情報送受信部102の受信出力を第1のゲート部107〜第Mのゲート部108の出力に基いてゲイン制御する第1のゲイン制御部〜第Mのゲイン制御部、124は第1のゲイン制御部109〜第Mのゲイン制御部110の出力を加算する音声情報合成部である。音声情報合成部124の出力は第1の音声情報送受信部101〜第Mの音声情報送受信部102に供給されネットワーク100を介して音声端末に送られる。
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
FIG. 1 is a block diagram showing a configuration of an audio mixing apparatus according to Embodiment 1 of the present invention. When the number of input voices is small, even if the background noise is large, it is easy to hear if the gain at the beginning of the speech is increased. However, when the number of input voices is large, if the gain of the first sound of the conversation is increased, the voice of the speaker who joins from the middle of the conversation becomes an obstacle to the ongoing conversation and is difficult to hear. Therefore, it has the following configuration. Reference numeral 100 denotes a network that transmits voice information to and from a voice terminal such as a handset, and 101 to 102 denote a first voice information transmission / reception unit that transmits and receives voice information to and from the network 100 to Mth (M is a positive integer). The voice information transmission / reception units 103 to 104 have presence / absence of a voice level (hereinafter referred to as “sound level”) equal to or higher than the voice threshold 120 from the reception output of the first voice information transmission / reception unit 101 to the Mth voice information transmission / reception unit 102. The first signal level detection unit to M-th signal level detection unit 105 to 106 detect voices having a sound threshold value of 120 or more at the outputs of the first signal level detection unit 103 to M-th signal level detection unit 104. A first sounded head part presence / absence detecting unit for detecting the presence / absence of a sounded head part which is a head part having a level to an Mth sounded head part presence / absence detecting unit, 121 is a first signal level detection A voice number detection unit for detecting the number of voices having a voice threshold value of 120 or more (hereinafter referred to as “valid voice number”) among the outputs of the 103rd to Mth signal level detection units 104; Gain change determination units 107 and 108 for determining whether the number is less than the threshold value 122. The gain change determination unit 123 outputs the outputs of the first sound head portion presence / absence detection unit 105 to the Mth sound head portion presence / absence detection unit 106. The first gate unit to the M-th gate unit whose outputs are controlled as gate signals, and 109 to 110 are the reception outputs of the first audio information transmitting / receiving unit 101 to the M-th audio information transmitting / receiving unit 102 as the first gate unit. A first gain control unit to an Mth gain control unit that performs gain control on the basis of the outputs of the 107th to Mth gate units 108, and 124 are outputs of the first gain control unit 109 to the Mth gain control unit 110, respectively. to add A voice information combining unit. The output of the voice information synthesis unit 124 is supplied to the first voice information transmission / reception unit 101 to the Mth voice information transmission / reception unit 102 and sent to the voice terminal via the network 100.

次に本発明の実施の形態1における音声ミキシング装置の動作について図2〜8を用いて説明する。図2は本発明の実施の形態1における第m(mは1〜Mの中の任意の整数)の信号レベル検出部の出力の模式図である。第1の信号レベル検出部103〜第Mの信号レベル検出部104の一構成要素である第mの信号レベル検出部では、まず時刻t(n)における有音部分の信号レベルであるアナログ有音レベル(V(n)(m))を算出し、次にアナログ有音レベルと有音閾値120との大小比較を行い2値化有音レベル(BV(n)(m))(有音閾値以上の場合は1、有音閾値未満の場合は0)を算出し出力する。ここで無音、無声音の場合はレベルの値は0となる。   Next, the operation of the audio mixing apparatus according to Embodiment 1 of the present invention will be described with reference to FIGS. FIG. 2 is a schematic diagram of the output of the m-th (m is an arbitrary integer from 1 to M) signal level detector in the first embodiment of the present invention. In the m-th signal level detection unit, which is a component of the first signal level detection unit 103 to the M-th signal level detection unit 104, first, analog sound that is the signal level of the sound part at time t (n). The level (V (n) (m)) is calculated, and then the analog sound level and the sound threshold 120 are compared, and the binarized sound level (BV (n) (m)) (sound threshold) 1 is calculated in the above case, and 0) is calculated and output if it is less than the sound threshold. Here, in the case of silence and unvoiced sound, the level value is zero.

図3は本発明の実施の形態1における第mの有音先頭部分有無検出部の出力の模式図である。第1の有音先頭部分有無検出部105〜第Mの有音先頭部分有無検出部106の一構成要素である第mの有音先頭部分有無検出部では第mの信号レベル検出部の出力から有音先頭部分の有無検出を行なう。有音先頭部分の有無検出の仕方としては例えば、現時刻での値BV(n)(m)−前時刻での値BV(n−1)(m)が1の場合にのみ有音先頭部分を検出したとすることができ、0または−1の場合は有音先頭部分を検出しなかったとすることができる。有音先頭部分を検出した場合はゲインアップ、そうでない場合はゲイン不変とする信号を出力する。   FIG. 3 is a schematic diagram of the output of the m-th sound head portion presence / absence detection unit according to Embodiment 1 of the present invention. From the output of the m-th signal level detection unit, the m-th sound head part presence / absence detection unit 105, which is a component of the first sound head part presence / absence detection unit 105 to the M-th sound head part presence / absence detection unit 106, The presence / absence detection of the head part of the sound is detected. As a method of detecting the presence / absence of a voiced head part, for example, the voiced head part is only when the value BV (n) (m) at the current time−the value BV (n−1) (m) at the previous time is 1. Can be assumed to be detected, and in the case of 0 or −1, it can be assumed that the head portion of sound is not detected. If the head part of sound is detected, a gain is increased, and if not, a signal that does not change the gain is output.

図4は本発明の実施の形態1における音声数検出部121の出力の模式図である。音声数検出部121では、第1の信号レベル検出部103〜第Mの信号レベル検出部104の各出力である2値化有音レベルBV(n)(m)の値を加算した値を有効音声数として検出する。時刻t(n)における有効音声数をVP(n)で表す。   FIG. 4 is a schematic diagram of the output of the voice number detection unit 121 according to Embodiment 1 of the present invention. In the sound number detection unit 121, a value obtained by adding the values of the binarized sound level BV (n) (m), which are the outputs of the first signal level detection unit 103 to the Mth signal level detection unit 104, is valid. Detect as voice number. The number of effective voices at time t (n) is represented by VP (n).

図5は本発明の実施の形態1におけるゲイン変更判断部の出力の模式図である。ゲイン変更判断部123では、有効音声数が有効音声数閾値122未満か否かを大小比較により判断する。有効音声数が有効音声数閾値以上の場合はゲイン変更OFF、有効音声数閾値未満の場合はゲイン変更ONとして出力する。   FIG. 5 is a schematic diagram of the output of the gain change determination unit in Embodiment 1 of the present invention. The gain change determination unit 123 determines whether or not the number of effective sounds is less than the effective sound number threshold 122 by size comparison. When the number of valid voices is equal to or greater than the valid voice number threshold, the gain change is OFF, and when the number is less than the valid voice number threshold, the gain change is output as ON.

図6は本発明の実施の形態1における第mのゲート部の出力の模式図である。第1のゲート部107〜第Mのゲート部108の一構成要素である第mのゲート部では、第mの有音先頭部分有無検出部の出力をゲイン変更判断部123の出力により制御する。例えば第mの有音先頭部分有無検出部がゲインアップを示している場合に、ゲイン変更判断部123の出力がゲイン変更ONのときはゲインアップを出力するが、ゲイン変更OFFのときはゲイン不変を出力する。また、第mの有音先頭部分有無検出部がゲイン不変を示している場合には、ゲイン変更判断部123の出力がゲイン変更ONおよびゲイン変更OFFの何れのときもゲイン不変を出力する。第1のゲート部107〜第Mのゲート部108の出力は各々第1のゲイン制御部109〜第Mのゲイン制御部110のゲインを制御する。ゲインアップの場合は音量を上げる。ゲイン不変の場合は音量を変更しない。   FIG. 6 is a schematic diagram of the output of the mth gate portion in the first embodiment of the present invention. In the m-th gate unit, which is one component of the first gate unit 107 to the M-th gate unit 108, the output of the m-th sound leading portion presence / absence detection unit is controlled by the output of the gain change determination unit 123. For example, when the m-th sound head portion presence / absence detection unit indicates gain up, the gain change determination unit 123 outputs gain up when the gain change is ON, but gain invariant when the gain change is OFF. Is output. When the m-th sound head portion presence / absence detection unit indicates that the gain is unchanged, the gain unchanged is output when the output of the gain change determination unit 123 is either gain change ON or gain change OFF. The outputs of the first gate unit 107 to the Mth gate unit 108 control the gains of the first gain control unit 109 to the Mth gain control unit 110, respectively. Increase the volume for gain up. If the gain remains unchanged, the volume is not changed.

図7は本発明の実施の形態1における音声ミキシング装置のゲイン制御のフローチャートである。S701では、ネットワーク100を介した音声情報は第1の音声情報送受信部101〜第Mの音声情報送受信部102で受信され時刻t(n)における第1の音声情報〜第Mの音声情報が入力される。S702では、第1のゲイン制御部109〜第Mのゲイン制御部110全てのゲインをリセットする。S703では、第1の信号レベル検出部103〜第Mの信号レベル検出部104において各々が対応する第1の音声情報送受信部101〜第Mの音声情報送受信部102の出力の有音レベルを検出する。S704では、第1の有音先頭部分有無検出部105〜第Mの有音先頭部分有無検出部106において各々が対応する第1の信号レベル検出部103〜第Mの信号レベル検出部104の出力から有音先頭部分の有無検出を行なう。S705では、音声数検出部121において有効音声数を検出する。S706では、ゲイン変更判断部123において有効音声数が有効音声数閾値未満か否かを判断する。有効音声数閾値以上の場合はS707に進み、有効音声数閾値未満の場合はS708に進む。S707では、第1のゲイン制御部109〜第Mのゲイン制御部110についてゲイン不変とする。S708では、有音先頭部分有りを検出した該当するゲイン制御部についてのみゲインアップし他のゲイン制御部についてゲイン不変とする。S709では、第1のゲイン制御部109〜第Mのゲイン制御部110の出力を音声情報合成部124に入力し音声情報合成する。音声情報合成部124の出力は第1の音声情報送受信部101〜第Mの音声情報送受信部102に入力され、ネットワーク100を介して各発話者に送られる。S710では、時刻t(n)における処理を終了し時刻t(n+1)における処理のために待機する。   FIG. 7 is a flowchart of gain control of the audio mixing apparatus according to Embodiment 1 of the present invention. In S701, the voice information via the network 100 is received by the first voice information transmitting / receiving unit 101 to the Mth voice information transmitting / receiving unit 102, and the first voice information to the Mth voice information at time t (n) are input. Is done. In S702, the gains of all of the first gain control unit 109 to the Mth gain control unit 110 are reset. In step S <b> 703, the first signal level detection unit 103 to the M-th signal level detection unit 104 detect the sound levels of the outputs from the first audio information transmission / reception unit 101 to the M-th audio information transmission / reception unit 102. To do. In S <b> 704, the outputs of the first signal level detection unit 103 to the M-th signal level detection unit 104 respectively correspond to the first sounded head part presence / absence detection unit 105 to the M-th sounding head part presence / absence detection unit 106. The presence / absence of the head part of the sound is detected. In step S <b> 705, the number-of-voices detection unit 121 detects the number of valid voices. In step S706, the gain change determination unit 123 determines whether the number of effective sounds is less than the effective sound number threshold. If it is greater than or equal to the valid voice number threshold, the process proceeds to S707, and if it is less than the valid voice number threshold, the process proceeds to S708. In S707, the gains of the first gain control unit 109 to the Mth gain control unit 110 are not changed. In S708, the gain is increased only for the corresponding gain control unit that has detected the presence of the leading part of the sound, and the gain is unchanged for the other gain control units. In S709, the outputs of the first gain control unit 109 to the Mth gain control unit 110 are input to the voice information synthesis unit 124 to synthesize voice information. The output of the voice information synthesis unit 124 is input to the first voice information transmission / reception unit 101 to the Mth voice information transmission / reception unit 102 and is sent to each speaker via the network 100. In S710, the process at time t (n) is ended and the process waits for the process at time t (n + 1).

図8は本発明の実施の形態1における音声ミキシング装置の動作説明のための音声の状態図である。話者A〜Eが会話に参加しており、図示のタイミングで会話がなされているとする。有効音声数閾値=3とする。時刻t(a)で話者Bの有音先頭部分が検出された際、有効音声数を算出する。話者Aが発話中であるので、Bを含めた有効音声数は2となる。有効音声数閾値と有効音声数を比較し、閾値未満となるため、話者Bの音声パケットはゲインアップされ、音声情報合成される。時刻t(b)で話者Eの有音先頭部分が検出された場合は、AとBが発話中であるので、有効音声数は3となる。有効音声数閾値以上となり話者Eの音声パケットはゲイン不変のまま音声情報合成される。時刻t(c)でも同様で、有効音声数は3となる。有効音声数閾値以上となり話者Aの音声パケットはゲイン不変のまま音声情報合成される。時刻t(d)で話者Dの有音先頭部分が検出された場合は、有効音声数が2なので話者Dの音声パケットはゲインアップされ音声情報合成される。   FIG. 8 is a sound state diagram for explaining the operation of the sound mixing apparatus according to the first embodiment of the present invention. It is assumed that the speakers A to E participate in the conversation and the conversation is made at the timing shown in the figure. It is assumed that the effective voice number threshold = 3. When the voiced head portion of speaker B is detected at time t (a), the number of effective voices is calculated. Since speaker A is speaking, the number of effective voices including B is two. Since the effective voice number threshold value and the effective voice number are compared and become less than the threshold value, the voice packet of the speaker B is gained up, and voice information is synthesized. When the voiced head portion of the speaker E is detected at time t (b), A and B are speaking, so the number of effective voices is 3. The voice packet of the speaker E is synthesized with the voice information synthesized with the gain unchanged. The same is true at time t (c), and the number of effective voices is three. The voice information of the speaker A is synthesized with the gain unchanged and the voice packet of the speaker A becomes equal to or greater than the effective voice threshold. When the voiced head portion of speaker D is detected at time t (d), the number of effective voices is 2, so that the voice packet of speaker D is increased in gain and voice information is synthesized.

以上のように本実施の形態1によれば、背景ノイズの多い環境で少人数で音声会議を使用する際には、聞き取りにくいとされる会話の先頭音のゲインアップを行うので聞き返しなどが少なくなり、効率よく会話を進めることができる。一方、多人数で音声会議を使用する際には割り込み音声の会話の先頭音のゲインアップを停止するため、会話の途中から参加する話者の音声が、継続中の会話の障害とならない。これにより、ユーザは音量調整など会議の妨げとなる操作をすることなく音声会議を使用することができる。   As described above, according to the first embodiment, when a voice conference is used with a small number of people in an environment with a lot of background noise, the gain of the first tone of the conversation that is difficult to hear is increased, so there is little recollection. It is possible to advance the conversation efficiently. On the other hand, when using a voice conference with a large number of people, the gain of the leading sound of the interrupted voice conversation is stopped, so that the voice of the speaker who joins from the middle of the conversation does not become an obstacle to the ongoing conversation. Thus, the user can use the audio conference without performing an operation that hinders the conference such as volume adjustment.

尚、以上の説明ではM個の音声情報送受信部、信号レベル検出部、有音先頭部分有無検出部、ゲート部、ゲイン制御部を前提とした。しかし、時分割処理を行う場合は各々について1個備えればよい。   In the above description, M voice information transmitting / receiving units, a signal level detection unit, a voiced head portion presence / absence detection unit, a gate unit, and a gain control unit are assumed. However, when performing time-sharing processing, it is sufficient to provide one for each.

また、上記説明した手段を機能させるためのプログラムを用いてソフトウェア処理とすることも可能である。   It is also possible to perform software processing using a program for causing the above-described means to function.

さらに、以上の説明では第1の信号レベル検出部103〜第Mの信号レベル検出部104の入力を第1の音声情報送受信部101〜第Mの音声情報送受信部102からの受信入力としたが、第1の信号レベル検出部103〜第Mの信号レベル検出部104への入力は、第1の音声情報送受信部101〜第Mの音声情報送受信部102からの受信入力にAGCを介して参加者の音量レベルを調整したものとしてもよい。この場合はAGCにより、マイクと話者との距離が異なることなどが原因となる参加者の音量レベルの差異を取り除くことにより、より快適な通話環境となる。
(実施の形態2)
実施の形態1においては参加人数の多少に拘わらず適用できる音声ミキシング装置について説明した。しかし、参加人数が有効音声数未満であることが定まっている場合は有効音声数の大小を考慮する必要がないため以下の様にすることができる。
Further, in the above description, the inputs from the first signal level detection unit 103 to the Mth signal level detection unit 104 are the reception inputs from the first audio information transmission / reception unit 101 to the Mth audio information transmission / reception unit 102. The inputs to the first signal level detection unit 103 to the Mth signal level detection unit 104 participate in the reception inputs from the first audio information transmission / reception unit 101 to the Mth audio information transmission / reception unit 102 via the AGC. The volume level of the person may be adjusted. In this case, a more comfortable call environment can be obtained by removing the difference in the volume level of the participant caused by the distance between the microphone and the speaker being different by AGC.
(Embodiment 2)
In the first embodiment, an audio mixing apparatus that can be applied regardless of the number of participants has been described. However, when it is determined that the number of participants is less than the number of valid voices, it is not necessary to consider the size of the number of valid voices, so the following can be performed.

図9は本発明の実施の形態2における音声ミキシング装置の構成を示すブロック図である。図1と比較して省くことができる手段及び信号は、音声数検出部121、ゲイン変更判断部123、第1のゲート部107〜第Mのゲート部108及び有効音声数閾値122である。この場合は第1の有音先頭部分有無検出部105〜第Mの有音先頭部分有無検出部106の出力が有音先頭部分を検出した場合はゲインアップ、そうでない場合はゲイン不変とする信号が第1のゲイン制御部109〜第Mのゲイン制御部110に直接供給される。   FIG. 9 is a block diagram showing a configuration of an audio mixing apparatus according to Embodiment 2 of the present invention. The means and signals that can be omitted compared to FIG. 1 are the voice number detection unit 121, the gain change determination unit 123, the first gate unit 107 to the Mth gate unit 108, and the effective voice number threshold value 122. In this case, the first sounded head portion presence / absence detecting unit 105 to the Mth sounded head portion presence / absence detecting unit 106 detects that the sounded head portion is detected, and the gain is increased. Are directly supplied to the first gain control unit 109 to the Mth gain control unit 110.

図10は本発明の実施の形態2における音声ミキシング装置のゲイン制御のフローチャートである。図7と比較して省くことができるステップは、音声数検出部121において有効音声数を検出すること(S705)、ゲイン変更判断部123において有効音声数が有効音声数閾値未満か否かを判断すること(S706)、第1のゲイン制御部109〜第Mのゲイン制御部110の全てについてゲイン不変とすること(S707)である。   FIG. 10 is a flowchart of gain control of the audio mixing apparatus according to the second embodiment of the present invention. Steps that can be omitted in comparison with FIG. 7 include detecting the number of effective sounds in the sound number detecting unit 121 (S705), and determining whether the number of effective sounds is less than the effective sound number threshold in the gain change determining unit 123. (S706) and making the gain invariant for all of the first gain control unit 109 to the Mth gain control unit 110 (S707).

以上のように本実施の形態2によれば、実施の形態1に比較して多人数で音声会議を使用する際に必要な手段等を省くことができ、これにより、ユーザは音量調整など会議の妨げとなる操作をすることなく音声会議を経済的に使用することができる。   As described above, according to the second embodiment, compared with the first embodiment, it is possible to omit means necessary when using the audio conference with a large number of people, and thus the user can adjust the volume such as volume control. Audio conferencing can be used economically without performing any operation that hinders the user.

尚、以上の説明ではM個の音声情報送受信部、信号レベル検出部、有音先頭部分有無検出部、ゲイン制御部を前提とした。しかし、時分割処理を行う場合は各々について1個備えればよい。   In the above description, it is assumed that there are M audio information transmitting / receiving units, a signal level detection unit, a voice head portion presence / absence detection unit, and a gain control unit. However, when performing time-sharing processing, it is sufficient to provide one for each.

また、上記説明した手段を機能させるためのプログラムを用いてソフトウェア処理とすることも可能である。   It is also possible to perform software processing using a program for causing the above-described means to function.

本発明の音声ミキシング装置は、適切な音量ですべての発話者の音声を聞き取ることができるという優れた効果を有しているため、飛行機など背景ノイズが高い機中などにおけるインターコミュニケーションシステム等において有用である。   The voice mixing device of the present invention has an excellent effect of being able to hear the voices of all the speakers at an appropriate volume, and thus is useful in an intercommunication system or the like in a plane with high background noise such as an airplane. It is.

本発明の実施の形態1における音声ミキシング装置の構成を示すブロック図The block diagram which shows the structure of the audio | voice mixing apparatus in Embodiment 1 of this invention. 本発明の実施の形態1における第mの信号レベル検出部の出力の模式図Schematic diagram of the output of the m-th signal level detection unit in Embodiment 1 of the present invention 本発明の実施の形態1における第mの有音先頭部分有無検出部の出力の模式図Schematic diagram of output of m-th sound head portion presence / absence detection unit in Embodiment 1 of the present invention 本発明の実施の形態1における音声数検出部の出力の模式図Schematic diagram of the output of the voice number detection unit in Embodiment 1 of the present invention 本発明の実施の形態1におけるゲイン変更判断部の出力の模式図Schematic diagram of output of gain change determination unit in Embodiment 1 of the present invention 本発明の実施の形態1における第mのゲート部の出力の模式図Schematic diagram of the output of the m-th gate part in the first embodiment of the present invention 本発明の実施の形態1における音声ミキシング装置のゲイン制御のフローチャートFlowchart of gain control of audio mixing apparatus in embodiment 1 of the present invention 本発明の実施の形態1における音声ミキシング装置の動作説明のための音声の状態図Audio state diagram for explaining the operation of the audio mixing apparatus according to Embodiment 1 of the present invention. 本発明の実施の形態2における音声ミキシング装置の構成を示すブロック図The block diagram which shows the structure of the audio | voice mixing apparatus in Embodiment 2 of this invention. 本発明の実施の形態2における音声ミキシング装置のゲイン制御のフローチャートFlowchart of gain control of audio mixing apparatus in embodiment 2 of the present invention 従来の音声ミキシング装置の構成を示すブロック図Block diagram showing the configuration of a conventional audio mixing device

符号の説明Explanation of symbols

100 ネットワーク
101〜102 第1〜第Mの音声情報送受信部
103〜104 第1〜第Mの信号レベル検出部
105〜106 第1〜第Mの有音先頭部分有無検出部
107〜108 第1〜第Mのゲート部
109〜110 第1〜第Mのゲイン制御部
120 有音閾値
121 音声数検出部
122 有効音声数閾値
123 ゲイン変更判断部
124 音声情報合成部
DESCRIPTION OF SYMBOLS 100 Network 101-102 1st-Mth audio | voice information transmission / reception part 103-104 1st-Mth signal level detection part 105-106 1st-Mth sound head part presence-and-absence detection part 107-108 1st-1st M-th gate unit 109 to 110 First to M-th gain control unit 120 Voice threshold 121 Voice number detection unit 122 Effective voice number threshold 123 Gain change determination unit 124 Voice information synthesis unit

Claims (4)

外部からの複数の音声情報の受信を行なう音声情報受信手段と、
前記音声情報受信手段の音声情報毎の出力に有音先頭部分の有無を検出する有音先頭部分有無検出手段と、
前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げないゲイン制御手段と、
前記ゲイン制御手段の音声情報毎の出力を加算する音声情報合成手段と、
前記音声情報合成手段の出力を外部に送信する音声情報送信手段と、
を有する音声ミキシング装置。
Voice information receiving means for receiving a plurality of voice information from outside;
A voice head part presence / absence detecting means for detecting the presence or absence of a voice head part in the output of each voice information of the voice information receiving means;
Gain control means for raising only the volume of voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means, and not raising the volume of other voice information;
Voice information synthesis means for adding outputs for each voice information of the gain control means;
Voice information transmitting means for transmitting the output of the voice information synthesizing means to the outside;
An audio mixing device having
前記音声情報受信手段の音声情報毎の出力の中で有音閾値以上の音声の数を検出する音声数検出手段を更に有し、
ゲイン制御手段は、前記音声数検出手段の出力が有効音声数閾値未満の場合であって前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げないことを特徴とする請求項1に記載の音声ミキシング装置。
A voice number detecting means for detecting the number of voices equal to or higher than a voice threshold in the output of each voice information of the voice information receiving means;
The gain control means is only for the volume of the voice information indicating that there is a voice head part in the output of the voice head part presence / absence detection means when the output of the voice number detection means is less than an effective voice number threshold. The sound mixing apparatus according to claim 1, wherein the sound volume of other sound information is not raised.
コンピュータを、
外部からの複数の音声情報の受信を行なう音声情報受信手段と、
前記音声情報受信手段の音声情報毎の出力に有音先頭部分の有無を検出する有音先頭部分有無検出手段と、
前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げないゲイン制御手段と、
前記ゲイン制御手段の音声情報毎の出力を加算する音声情報合成手段と、
前記音声情報合成手段の出力を外部に送信する音声情報送信手段として機能させるためのプログラム。
Computer
Voice information receiving means for receiving a plurality of voice information from outside;
A voice head part presence / absence detecting means for detecting the presence or absence of a voice head part in the output of each voice information of the voice information receiving means;
Gain control means for raising only the volume of voice information indicating that there is a voiced head part in the output of the voiced head part presence / absence detecting means, and not raising the volume of other voice information;
Voice information synthesis means for adding outputs for each voice information of the gain control means;
A program for functioning as voice information transmitting means for transmitting the output of the voice information synthesizing means to the outside.
前記音声情報受信手段の音声情報毎の出力の中で有音閾値以上の音声の数を検出する音声数検出手段を更に有し、
ゲイン制御手段は、前記音声数検出手段の出力が有効音声数閾値未満の場合であって前記有音先頭部分有無検出手段の出力の中で有音先頭部分が有る旨を示す音声情報の音量のみを上げ他の音声情報の音量を上げないことを特徴とする請求項3に記載のプログラム。
A voice number detecting means for detecting the number of voices equal to or higher than a voice threshold in the output of each voice information of the voice information receiving means;
The gain control means is only for the volume of the voice information indicating that there is a voice head part in the output of the voice head part presence / absence detection means when the output of the voice number detection means is less than an effective voice number threshold. The program according to claim 3, wherein the volume of other audio information is not increased.
JP2008210436A 2008-08-19 2008-08-19 Voice mixing device, and program Pending JP2010050512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008210436A JP2010050512A (en) 2008-08-19 2008-08-19 Voice mixing device, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2008210436A JP2010050512A (en) 2008-08-19 2008-08-19 Voice mixing device, and program

Publications (1)

Publication Number Publication Date
JP2010050512A true JP2010050512A (en) 2010-03-04

Family

ID=42067285

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2008210436A Pending JP2010050512A (en) 2008-08-19 2008-08-19 Voice mixing device, and program

Country Status (1)

Country Link
JP (1) JP2010050512A (en)

Similar Documents

Publication Publication Date Title
US10142484B2 (en) Nearby talker obscuring, duplicate dialogue amelioration and automatic muting of acoustically proximate participants
US10149049B2 (en) Processing speech from distributed microphones
US9269367B2 (en) Processing audio signals during a communication event
JP5581329B2 (en) Conversation detection device, hearing aid, and conversation detection method
JP2007290691A (en) Vehicle communication system
US20120303363A1 (en) Processing Audio Signals
JP2006180039A (en) Acoustic apparatus and program
JP6959917B2 (en) Event detection for playback management in audio equipment
EP4064692A1 (en) Smart audio muting in a videoconferencing system
JP2021511755A (en) Speech recognition audio system and method
US10192566B1 (en) Noise reduction in an audio system
JP5251808B2 (en) Noise removal device
JP2006211156A (en) Acoustic device
WO2022181013A1 (en) Meeting system
KR101597768B1 (en) Interactive multiparty communication system and method using stereophonic sound
JP2010050512A (en) Voice mixing device, and program
JP2005157086A (en) Speech recognition device
JP2007096555A (en) Voice conference system, terminal, talker priority level control method used therefor, and program thereof
JP2017022521A (en) Directivity control system and voice output control method
JP7293863B2 (en) Speech processing device, speech processing method and program
JP2019537071A (en) Processing sound from distributed microphones
JP2000049948A (en) Speech communication device and speech communication system
JP2008294600A (en) Sound emission and collection apparatus and sound emission and collection system
JP2007258951A (en) Teleconference equipment
JP2007336395A (en) Voice processor and voice communication system