JPH0484553A

JPH0484553A - Voice mixing device

Info

Publication number: JPH0484553A
Application number: JP19887690A
Authority: JP
Inventors: Noboru Harada; 昇原田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-07-26
Filing date: 1990-07-26
Publication date: 1992-03-17

Abstract

PURPOSE:To obtain a voice signal with excellent quality by extracting N lines recognizing a voice part of a voice signal from voice information sent from conference terminals at plural points automatically in the order of occurrence of the voice signal and synthesizing the voice parts. CONSTITUTION:Voice information via a network from a conference terminal installed at plural points is received by voice information transmission reception sections 1-3 and sent to voice level detection sections 4-6. Then a control section 8 receiving the presence of the voice signal from the detection sections 4-6 and the recognized time checks whether or not number of the lines detecting the voice part of the voice signal is less than or more than the preset N lines. When the number of the lines is larger than the N lines, the time when the voice part of the voice signal is recognized through the lines is checked in time series, and N lines in the order of faster occurrence of the voice signal are selected, a priority selection section 7 is controlled to connect the selected N lines to a voice information synthesis section 9, which synthesizes the voice information of the selected maximum N lines. Thus, the voice signal with excellent quality is obtained.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声会議システムに使用される音声ミキシンク
装置に関し、特に複数地点を対象とした多地点の会議シ
ステムにおいて同時に発言を行っている複数地点の音声
情報から限定された数の音声情報のみを選択合成する音
声ミキシンク装置に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to an audio mixing device used in an audio conference system, and particularly in a multi-point conference system that targets multiple points, where multiple points are simultaneously speaking. The present invention relates to an audio mixing device that selects and synthesizes only a limited number of audio information from audio information.

[Conventional technology]

従来、この種の音声ミキシング装置は、複数地点の会議
端末から入力された全ての音声情報を単純に合成してい
た。Conventionally, this type of audio mixing device simply synthesized all audio information input from conference terminals at multiple locations.

[Problem to be solved by the invention]

複数地点の音声情報を合成する場合、会議端末か設置さ
れる各会議室等の環境にもよるか、５地点以上の会議室
からの音声を無条件に合成した場合、各会議室からのエ
コーか重畳され、合成された音声が聞きとりにくくなる
問題かあり、運用」−あるいは技術上から４地点程度ま
での合成が限度である。従来の音声ミキシング装置ては
、無条件に入力音声情報を合成していたため、多地点の
会議システム等で提供する会議サービスは特に運用面で
制限を受けていた。すなわち、会議への参加地点数を４
地点に制限するとか、エコーの問題を解決せずに品質の
悪い音声で運用せねばならないという欠点があった。When synthesizing audio information from multiple locations, it depends on the environment of the conference terminal or each conference room where it is installed, or if audio from conference rooms from five or more locations is synthesized unconditionally, the echo from each conference room However, there is a problem that the synthesized voice may be difficult to hear due to the superimposed sound, and due to operational or technical reasons, synthesis is limited to about four points. Conventional audio mixing devices unconditionally synthesize input audio information, and therefore conferencing services provided by multipoint conference systems and the like are particularly limited in terms of operation. In other words, the number of participating points in the conference is 4.
The drawbacks were that it was limited to certain locations and had to be operated with poor quality audio without solving the echo problem.

本発明の目的は、品質の良い音声が提供でき、運用面で
の利便性を向上させた多地点会議システムを横築するこ
とのできる音声ミキシング装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide an audio mixing device that can provide high-quality audio and can horizontally build a multipoint conference system with improved operational convenience.

[Means to solve the problem]

本発明の音声ミキシング装置は、複数地点を対象とした
多地点の音声会議システムにおいて、前記複数地点の会
議端末からの音声情報を引込む引込手段と、前記引込手
段で引込まれた各地点の音声情報の音声レベルを一定時
間毎に検出し音声の有音部分及び無音部分を識別する識
別手段と、複数の音声情報の有音部分を検出したときに
前記複数の音声情報のうち特定のＮ個の音声情報のみを
ある一定時間優先選択する選択手段と、この優先選択さ
れた音声情報を混合する混合手段と、混合された音声情
報のうちそれぞれの送出元から入力された音声情報を削
除し残りの混合された音声情報を前記送信元へ送出する
送出手段とを（ｔａえる構成である。The audio mixing device of the present invention is a multi-point audio conference system targeting a plurality of locations, and includes a pull-in means for pulling in audio information from conference terminals at the plurality of points, and a voice information of each point pulled in by the pull-in means. an identification means for detecting the sound level of the voice at regular intervals and identifying a sound part and a silent part of the sound; A selection means for preferentially selecting only audio information for a certain period of time, a mixing means for mixing the preferentially selected audio information, and a selection means for deleting the audio information input from each transmission source from the mixed audio information, It is configured to include a sending means for sending the mixed audio information to the transmission source.

〔Example〕

次に、本発明の実施例について図面を参照して説明する
。Next, embodiments of the present invention will be described with reference to the drawings.

本発明の一実施例を小ず第１Ｎを参照すると、音声ミキ
シンク装置は、音声情報送受信部１〜３と、音声レベル
検出部４〜６と、優先選択部７と、制御部８と、音声情
報合成部９とを備える。Referring to No. 1N for an embodiment of the present invention, the audio mixing device includes audio information transmitting/receiving units 1 to 3, audio level detection units 4 to 6, a priority selection unit 7, a control unit 8, and an audio and an information synthesis section 9.

音声情報は交換機のネットワークを経由して音声情報送
受信部１〜・３に接続され、音声情報送受信部１〜３で
は、この入力された音声情報を音声レベル検出部４〜６
及び優先選択部７へ出力する。The voice information is connected to the voice information transmitting/receiving sections 1 to 3 via the exchange network, and the voice information transmitting and receiving sections 1 to 3 transmit the input voice information to the voice level detecting sections 4 to 6.
and output to the priority selection section 7.

音声レベル検出部４〜６では、各回線毎の音声情報の有
音部分及び無音部分を判断し、この判断結果を音声情報
の付随データとして制御部８に出力する。この付随デー
タは、回線番号１回線対応の音声の有音部分を認識した
時刻、音声の無音部分を認識した時刻及び音声の有無情
報などから構成される。The audio level detection units 4 to 6 determine whether the audio information for each line is a sound part or a silent part, and output the determination result to the control unit 8 as accompanying data of the audio information. This accompanying data is composed of the time when the active part of the voice corresponding to line number 1 was recognized, the time when the silent part of the voice was recognized, and the presence/absence information of the voice.

第２図は本発明の音声ミキシング装置を使用した音声会
議システムの一実施例を説明するための図である。１０
〜１２は各会議室などに設置される会議端末で１３は交
換機である。複数地点に設置された会議端末のうち会議
に参加する会議端末は交換機１３で交換接続され、１゛
１声ミキシンク装置］４に接続される。FIG. 2 is a diagram for explaining an embodiment of an audio conference system using the audio mixing device of the present invention. 10
12 are conference terminals installed in each conference room, and 13 is an exchange. Among the conference terminals installed at a plurality of locations, the conference terminals participating in the conference are exchange-connected by an exchange 13 and connected to a 1-voice mixing device 4.

以下に動作を説明する。会議端末からネットワークを介
した音声情報は音声情報送受信部１〜３で受信され、音
声レベル検出部４〜６及び優先選択部７へ伝達される。The operation will be explained below. Audio information from the conference terminal via the network is received by audio information transmitting/receiving units 1 to 3, and transmitted to audio level detection units 4 to 6 and priority selection unit 7.

音声レベル検出部４〜６では、一定時間ごとに音声の有
音部分及び無音部分を監視して、この監視時間が経過し
ても同一の状態が継続しているか否かにより音声の有音
部分及び無音部分の識別判断を行う。The sound level detection units 4 to 6 monitor the sound portion and silent portion of the sound at regular intervals, and detect the sound portion of the sound depending on whether the same state continues even after the monitoring time has elapsed. and identify silent parts.

音声レベル検出部４〜６から音声の有無及び認識した時
刻を受信した制御部８は、まず音声の有音部分が検出さ
れた回線数が予め設定されたＮ（Ｎは正の整数）回線よ
り少ないかどうかをチエツクし、少ない場合には優先選
択部７を制御し、音声の有音部分が検出された回線のみ
を音声情報合成部９に接続する。又、多い場合にはこれ
ら回線で音声の有音部分が認識された時刻を時系列的に
チエツクし、音声の発生した順序に従い早いものからＮ
同線を選択し、優先選択部７を制御し選択されたＮ回線
を音声情報合成部９へ接続する。なお、音声の有音部分
が検出された回線かＮ個以下の場合には無条件にその音
声情報を音声情報合成部９に出力する。The control unit 8, which receives the presence or absence of voice and the recognized time from the voice level detection units 4 to 6, first detects the presence or absence of voice from the preset number of lines N (N is a positive integer) on which the active part of the voice has been detected. It is checked whether the number is low, and if it is, the priority selection section 7 is controlled to connect only the line in which the voiced part of the voice is detected to the voice information synthesis section 9. In addition, if there are many, check the time when the active part of the voice was recognized on these lines in chronological order, and select N from the earliest according to the order in which the voice occurred.
The same line is selected, the priority selection unit 7 is controlled, and the selected N lines are connected to the audio information synthesis unit 9. Note that if the number of lines in which the active portion of voice is detected is N or less, the voice information is unconditionally output to the voice information synthesis section 9.

音声情報合成部９は選択された最大Ｎ回線の音声情報を
合成して音声情報送受信部１〜３へ出力し、音声情報送
受信部１〜３では、個々にこの合成された音声情報から
交換機経由で入力されたそれぞれ送出元の音声情報を削
除し、交換機経由で各地点へ出力する。The voice information synthesizing section 9 synthesizes the voice information of the selected maximum N lines and outputs it to the voice information transmitting/receiving sections 1 to 3.The voice information transmitting and receiving sections 1 to 3 individually transmit the synthesized voice information from the synthesized voice information via the exchange. The audio information input at each transmission source is deleted and output to each location via the exchange.

第３図は優先選択部７における選択方法の概念を説明す
るための図である。第３図において、時間軸はマクロ的
な時間（音声の有無を監視する監視時間は考慮しない）
を示し、ａ〜ｅはそれぞれ地点１〜地点５の音声情報回
線である。例えは、音声合成可能地点数Ｎを「４」と仮
定すると、時刻Ａ及び時刻Ｃでは音声の有音部分が認識
された音声情報回線は、それぞれａ、ｃ、ｄ及びｂｃ、
ｅとなりＮより少ないため、これら音声情報回線は音声
情報合成部９に接続され合成される。FIG. 3 is a diagram for explaining the concept of the selection method in the priority selection section 7. In Figure 3, the time axis is macro time (monitoring time for monitoring the presence or absence of audio is not taken into account)
, and a to e are audio information lines at points 1 to 5, respectively. For example, assuming that the number N of points where voice synthesis is possible is "4", the voice information lines in which the voiced part of the voice is recognized at time A and time C are a, c, d, and bc, respectively.
Since the number e is less than N, these voice information lines are connected to the voice information synthesis section 9 and synthesized.

しかし、時刻Ｂではａ〜ｅの全ての音声情報回線に音声
の有音部分が認識されＮ以上となるため、音声の有音部
分を認識した時刻の早いものからｅ、ｂ、ａ、及びｄの
４つの音声情報回線か優先選択され、音声情報合成部９
に接続され合成される。又、時刻ＡとＢあるいは時刻Ｂ
とＣとの間隔は、音声の有音部分及び無音部分をチエツ
クする一定の時間間隔で、この一定時間内は直前に優先
選択された音声情報回線の音声情報が音声情報合成部９
に出力され合成される。However, at time B, since the active part of the voice is recognized in all the voice information lines a to e, and the number of times is greater than N, e, b, a, and d The four voice information lines are selected preferentially, and the voice information synthesis section 9
are connected and synthesized. Also, time A and B or time B
The interval between and C is a fixed time interval for checking the voiced part and the silent part of the voice, and within this fixed time, the voice information of the voice information line that has been priority-selected immediately before is transferred to the voice information synthesis section 9.
is output and synthesized.

従って、例えは、１０地点が参加して会議が行われる場
合、Ｎ＝４とすると、５地点以上から同時に音声情報が
入力されても、」二連したように４地点からの音声情報
のみが自動的に合成されるので、通常の会議室で発生ず
るエコー程度には十分対応可能となり、運用面でも違和
感なく会議を運営できる。Therefore, for example, if a conference is held with 10 participating locations, and if N = 4, even if audio information is input from 5 or more locations simultaneously, only the audio information from 4 locations will be input in duplicate. Since it is automatically synthesized, it can sufficiently cope with the echoes that occur in a normal conference room, and the conference can be run without any discomfort.

〔Effect of the invention〕

本発明は以上説明したように、複数地点の会議端末から
送信される音声情報から、音声の有音部分が認識された
Ｎ回線を音声の発生順序に従い自動的に抽出し合成する
ように構成したので、品質の良い音声が提供でき、運用
面での利便性を向」ニさせた多地点会議システムを構築
することができるという効果を有する。As explained above, the present invention is configured to automatically extract and synthesize N lines in which voiced portions of voices are recognized from voice information transmitted from conference terminals at multiple locations according to the order in which voices are generated. Therefore, it is possible to provide high-quality audio and construct a multipoint conference system with improved operational convenience.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示す構成図、第２図は本発
明の音声ミキシンク装置を使用した音声会議システムの
一実施例を説明するための図、第３図は優先Ｊａ択部に
おける選択方法の概念を説明するための図である。１〜３・・・・・・音声情報送受信部、４〜６・・・・
・・音声レベル検出部、７・・・・・・優先選択部、８
・・・・・制御部、９・・・・・音声情報合成部、１０
〜１２・・・・−・会議端末、１３・・・・・・交換機
、１４・・・・・・音声ミキシング装置。Fig. 1 is a block diagram showing an embodiment of the present invention, Fig. 2 is a diagram illustrating an embodiment of an audio conference system using the audio mixing device of the invention, and Fig. 3 is a priority Ja selection section. FIG. 2 is a diagram for explaining the concept of a selection method in FIG. 1-3...Audio information transmitting/receiving section, 4-6...
...Audio level detection section, 7...Priority selection section, 8
...Control section, 9...Speech information synthesis section, 10
~12... Conference terminal, 13... Switchboard, 14... Audio mixing device.

Claims

[Claims]

In a multi-point audio conferencing system targeting multiple locations, a pull-in means pulls in audio information from conference terminals at the multiple locations, and the audio level of the audio information at each point pulled in by the pull-in means is determined at regular intervals. an identification means for detecting and identifying a voiced part and a silent part of a voice; and when a voiced part of a plurality of voice information is detected, priority is given to only specific N pieces of voice information among the plurality of voice information for a certain period of time; a selection means for selecting; a mixing means for mixing the preferentially selected audio information; and a mixing means for deleting the audio information input from each transmission source from among the mixed audio information and transmitting the remaining mixed audio information. 1. An audio mixing device comprising: a sending means for sending the audio to the original source.