JPH1188513A

JPH1188513A - Voice processing unit for inter-multi-point communication controller

Info

Publication number: JPH1188513A
Application number: JP24414697A
Authority: JP
Inventors: Shigenobu Matsuda; 茂信松田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1997-09-09
Filing date: 1997-09-09
Publication date: 1999-03-30

Abstract

PROBLEM TO BE SOLVED: To realize a clear voice conference and extension of the number of channels in the voice processing unit for an inter-multi-point communication controller that connects video conference terminals installed at plural points to realize inter-multi-point video conference. SOLUTION: Since a voice detection section 4 silences or decreases a gain of voice data from a video conference terminal that is discriminated as a non first-come channel by a first-come channel discrimination section 5 and provides an output of the result to a voice adder section 6a, the voice adder section 6 is not required to select a channel for voice addition processing but has only to conduct adder processing for voice data of all the channels at all times. Thus, the processing is simplified and the processing capability of a processor has a margin, then the number of handled channels is extended.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、３地点以上のテ
レビ会議端末を用いて多地点テレビ会議を実現する多地
点間通信制御装置の音声処理装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio processing device of a multipoint communication control device for realizing a multipoint videoconference using three or more videoconference terminals.

【０００２】[0002]

【従来の技術】図５は例えば特開平４−８４５５３号公
報に示された従来の多地点間通信制御装置の音声処理装
置のブロック図であり、１ａ〜３ｋは音声符号化・復号
化部、２１は音声レベル検出器２１ａ〜２１ｋからなる
音声検出部、２２は優先選択部、６は音声加算部、７は
制御部、である。2. Description of the Related Art FIG. 5 is a block diagram of a conventional voice processing device of a multipoint communication control device disclosed in Japanese Patent Application Laid-Open No. Hei 4-84553, wherein reference numerals 1a to 3k denote a voice encoding / decoding unit, Reference numeral 21 denotes a voice detection unit including voice level detectors 21a to 21k, 22 denotes a priority selection unit, 6 denotes a voice addition unit, and 7 denotes a control unit.

【０００３】テレビ会議端末(図示せず)からの音声デー
タはネットワークを経由して音声符号化・復号部１ａ〜
１ｋに接続され、音声符号化・復号部１ａ〜１ｋでは、
この入力された音声データをデジタルＰＣＭデータに復
号し、音声検出部２１および優先選択部２２へ出力す
る。音声検出部２１では音声レベル検出器２１ａ〜２１
ｋにより各チャネルごとに音声の有音部分及び無音部分
を判断し、この判断結果を音声データの付随データとし
て制御部７に出力する。この付随データは、チャネル番
号、チャネル対応の音声の有音部分を認識した時刻およ
び音声の有音情報などから構成される。[0003] Audio data from a video conference terminal (not shown) is transmitted via an audio encoding / decoding unit 1a through a network.
1k, and in the voice encoding / decoding units 1a to 1k,
The input audio data is decoded into digital PCM data and output to the audio detection unit 21 and the priority selection unit 22. The audio detector 21 includes audio level detectors 21a to 21a.
The sound part and the silent part of the sound are determined for each channel by k, and the result of the determination is output to the control unit 7 as the accompanying data of the sound data. The accompanying data includes a channel number, a time at which a sound part of the sound corresponding to the channel is recognized, sound information of the sound, and the like.

【０００４】次に動作について説明する。図５において
多地点テレビ会議が開催されると各テレビ会議端末から
の音声データはネットワークを経由して音声符号化・復
号部１ａ〜１ｋで受信され、デジタルＰＣＭデータに復
号された後、音声検出部２１および優先選択部２２へ伝
送される。音声検出部２１では、一定時間毎に音声の有
音部分および無音部分を監視して、この監視時間が経過
しても同一の状態が継続しているか否かにより音声の有
音部分、無音部分の識別判定を行う。Next, the operation will be described. In FIG. 5, when a multipoint video conference is held, voice data from each video conference terminal is received by the voice encoding / decoding units 1a to 1k via the network, and after being decoded into digital PCM data, voice detection is performed. It is transmitted to the section 21 and the priority selecting section 22. The voice detection unit 21 monitors a voiced portion and a voiceless portion of the voice at regular time intervals, and determines whether the voiced portion and the voiceless portion of the voice remain in the same state even after the monitoring time has elapsed. Is determined.

【０００５】音声検出部２１から音声の有音の有無およ
び認識した時刻を受信した制御部７は、まず音声の有音
部分が検出されたチャネル数があらかじめ設定されたＮ
個以下(Ｎは正の整数)かどうかを判定し、Ｎ個以下の場
合には優先選択部２２を制御し、音声の有音部分が検出
されたチャネルのみを音声加算部６に接続する。また、
Ｎ個を越える場合には各チャネルの音声の有音部分が認
識された時刻を時系列的に判定し、音声の発生した順序
に従い早いものからＮチャネルを選択し、優先選択部２
２を制御し選択されたＮチャネルを音声加算部６へ接続
する。なお、音声の有音部分が検出されたチャネルがＮ
個以下の場合には無条件にその音声情報を音声加算部６
に出力する。The control unit 7, having received the presence / absence of voiced sound and the time of recognition from the voice detecting unit 21, first sets the number of channels in which the voiced portion of the voice was detected to N in advance.
It is determined whether the number is equal to or less than N (N is a positive integer). If the number is equal to or less than N, the priority selecting unit 22 is controlled, and only the channel in which the sound portion of the sound is detected is connected to the sound adding unit 6. Also,
If the number exceeds N, the time at which the sound portion of the sound of each channel is recognized is determined in chronological order, and the N channels are selected from the earliest one in accordance with the order in which the sounds are generated.
2 to connect the selected N channels to the voice adder 6. It should be noted that the channel on which the sound portion of the voice is detected is N
If the number is less than or equal to the number, the audio information is unconditionally added to the audio adding unit 6.
Output to

【０００６】音声加算部６は選択された最大Ｎチャネル
の音声データを合成して音声符号化・復号部１ａ〜１ｋ
へ出力し、音声符号化・復号部１ａ〜１ｋでは、受信デ
ータと同一の符号化方式により符号化し、ネットワーク
を経由して各テレビ会議端末へ送信する。The speech adder 6 synthesizes the selected speech data of the maximum N channels and composes the speech encoders / decoders 1a to 1k.
The audio encoding / decoding units 1a to 1k encode the received data using the same encoding method as the received data, and transmit the encoded data to each video conference terminal via a network.

【０００７】[0007]

【発明が解決しようとする課題】従来の多地点間通信制
御装置の音声処理装置は以上のように構成されていたた
め、発言している端末の数および設定されたＮの値によ
って、優先選択部から指示される加算すべき音声のチャ
ネル数が変化し、音声加算部の処理が複雑になり、加算
する回線数に制約が生じるという問題点があった。Since the conventional voice processing device of the multipoint communication control device is configured as described above, the priority selection unit is controlled by the number of speaking terminals and the set value of N. However, the number of audio channels to be added, which is designated by the above, changes, the processing of the audio adding unit becomes complicated, and the number of lines to be added is limited.

【０００８】この発明は上記のような課題を解決するた
めになされたもので、音声加算部の加算処理を簡素化
し、加算するチャネル数の制約を減らすことにより拡張
性のある多地点間通信制御装置の音声処理装置を得るこ
とを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and has a scalable multipoint communication control by simplifying the addition processing of a voice addition unit and reducing the restriction on the number of channels to be added. An object of the present invention is to obtain an audio processing device.

【０００９】[0009]

【課題を解決するための手段】上記の目的に鑑み、この
発明は、３地点以上のテレビ会議端末を用いて多地点テ
レビ会議を実現する多地点間通信制御装置の音声処理装
置であって、上記各テレビ会議端末からの上記音声デー
タを受信するチャネル毎に上記音声データをディジタル
データに復号し、また音声処理された最終的な音声デー
タを受信したデータと同じ符号化方式により符号化して
各テレビ会議端末へそれぞれ送る音声符号化・復号手段
と、この音声符号化・復号手段からの復号された各チャ
ネルの音声データをそれぞれ一定時間毎に音声の有音部
分か無音部分かの識別判定を行うと共に、各チャネル毎
に上記音声符号化・復号手段からの復号された音声デー
タおよび上記復号されたデータが無音部分である時に相
当するデータのいずれかを発生する音声検出手段と、こ
の音声検出手段での識別判定結果に基づいて、有音部分
が検出されたチャネル数がＮ個以下(但しＮは正の整数)
の場合は全てのチャネルに関して上記音声符号化・復号
手段からの復号された音声データを発生させ、上記有音
部分が検出されたチャネル数がＮ個を越える場合には有
音部分が認識された順序に従い早いものからＮ個のチャ
ネルに関しては上記復号された音声データを発生させ、
残りのチャネルに関しては上記無音部分に相当するデー
タを発生させるように上記音声検出手段を制御する先着
チャネル判定手段と、上記音声検出手段から発生される
音声データおよび無音部分に相当するデータを全て加算
し、この音声処理された音声データを上記音声符号化・
復号手段から上記各テレビ会議端末に送る音声加算手段
と、を備えたことを特徴とする多地点間通信制御装置の
音声処理装置にある。SUMMARY OF THE INVENTION In view of the above-mentioned object, the present invention is an audio processing device of a multipoint communication control device for realizing a multipoint videoconference using videoconference terminals at three or more points, The audio data is decoded into digital data for each channel for receiving the audio data from each of the video conference terminals, and the final audio data subjected to audio processing is encoded by the same encoding method as the received data. The audio encoding / decoding means to be sent to the video conference terminal respectively, and the audio data of each channel decoded from the audio encoding / decoding means are discriminated at a given time interval to determine whether the audio is a voiced part or a silent part. And the data corresponding to the decoded audio data from the audio encoding / decoding means and the data when the decoded data is a silent part for each channel. A voice detection means for generating or Re, based on the identification result of the determination in the voice detecting means, the number of channels which sound section is detected more than N (where N is a positive integer)
In the case of (1), decoded speech data from the speech encoding / decoding means is generated for all the channels, and if the number of channels where the speech portion is detected exceeds N, a speech portion is recognized. For the N channels from the earliest in the order, generate the decoded audio data,
For the remaining channels, the first-arrival channel determination means for controlling the voice detection means so as to generate data corresponding to the silence part, and the voice data generated from the voice detection means and the data corresponding to the silence part are all added. Then, the audio data that has been subjected to the audio processing is
And a voice addition means for transmitting from the decoding means to each of the video conference terminals.

【００１０】さらに、上記音声検出手段において、上記
復号されたデータが無音部分である時に相当するデータ
が、無音レベルのデータであることを特徴とする多地点
間通信制御装置の音声処理装置にある。Further, in the above-mentioned voice detecting means, the data corresponding to the time when the decoded data is a silent part is data of a silent level. .

【００１１】さらに、上記音声検出手段において、上記
復号されたデータが無音部分である時に相当するデータ
が、上記音声符号化・復号手段からの上記復号した音声
データを１／Ｍ(但しＭは参加テレビ会議端末数)の音声
レベルに下げたデータであることを特徴とする多地点間
通信制御装置の音声処理装置にある。Further, in the audio detecting means, the data corresponding to the time when the decoded data is a silent part is 1 / M of the decoded audio data from the audio encoding / decoding means (where M is The audio processing device of the multipoint communication control device is characterized in that the data is reduced to the audio level of the number of video conference terminals.

【００１２】さらに、上記音声検出手段において、上記
復号されたデータが無音部分である時に相当するデータ
が、上記音声符号化・復号手段からの上記復号した音声
データが１／Ｍ(但しＭは参加テレビ会議端末数)の音声
レベルまで緩やかに下がるデータであることを特徴とす
る多地点間通信制御装置の音声処理装置にある。Further, in the voice detecting means, the data corresponding to the time when the decoded data is a silent part is 1 / M (where M is a participant) from the decoded voice data from the voice encoding / decoding means. The audio processing device of the multipoint communication control device is characterized in that the data is data that gradually decreases to the audio level of the number of video conference terminals.

【００１３】[0013]

【発明の実施の形態】以下、この発明を各実施の形態に
従って説明する。実施の形態１．図１はこの発明の一実施の形態による多
地点間通信制御装置の音声処理装置の構成を示すブロッ
ク図であり、１ａ〜１ｋは音声符号化・復号部、４は音
声検出部、５は先着チャネル判定部、６ａは音声加算
部、７ａは制御部である。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below according to each embodiment. Embodiment 1 FIG. FIG. 1 is a block diagram showing the configuration of a voice processing device of a multipoint communication control device according to an embodiment of the present invention, wherein 1a to 1k are voice encoding / decoding sections, 4 is a voice detection section, and 5 is a first-come-first-served section. A channel determination unit, 6a is a voice addition unit, and 7a is a control unit.

【００１４】また、図２は図１の音声検出部４の一例を
示すブロック図であり、４１ａ〜４１ｋは有声／無音判
定部、４２ａ〜４２ｋは無音レベル生成部、４３ａ〜４
３ｋは音声加算部６へ出力する音声データを受信音声デ
ータか無音レベルデータかのいずれかを選択するセレク
タである。FIG. 2 is a block diagram showing an example of the voice detection unit 4 of FIG. 1. Reference numerals 41a to 41k denote voiced / silence determination units, reference numerals 42a to 42k denote silence level generation units, and reference numerals 43a to 43k.
Reference numeral 3k denotes a selector for selecting either audio data to be output to the audio addition unit 6 or received audio data or silence level data.

【００１５】次に動作について説明する。図１において
多地点テレビ会議が開催されると各テレビ会議端末(図
示せず)からの音声データはネットワークを経由して音
声符号化・復号部１ａ〜１ｋで受信され、デジタルＰＣ
Ｍデータに復号された後、音声検出部４へ送信される。Next, the operation will be described. In FIG. 1, when a multipoint video conference is held, voice data from each video conference terminal (not shown) is received by voice coding / decoding sections 1a to 1k via a network, and the digital PC
After being decoded into M data, it is transmitted to the voice detection unit 4.

【００１６】音声検出部４では、一定時間ごとに音声の
有音部分および無音部分を監視して、この監視時間が経
過しても同一の状態が継続しているか否かにより音声の
有音部分／無音部分の識別判定を行い、その有音部分／
無音部分判定結果を先着チャネル判定部５へ送信する。The voice detection unit 4 monitors a voiced portion and a non-voiced portion of the voice at regular time intervals, and determines whether the voiced voiced portion of the voice remains in the same state even after the monitoring time has elapsed. / The discrimination judgment of the silent part is performed, and the sound part /
The silent part determination result is transmitted to the first-arrival channel determination unit 5.

【００１７】音声検出部４から音声の有音／無音の判定
結果を受信した先着チャネル判定部５は、まず音声の有
音部分が検出されたチャネル数が、予め制御部７ａより
設定されたＮ以下(Ｎは正の整数)かどうかを判定し、Ｎ
以下の場合にはすべての端末からの音声を音声加算部６
ａに出力する。また、有音検出したチャネル数がＮを越
える場合には各チャネルで音声の有音部分が認識された
時刻を時系列的に判定し、音声の発生した順序に従い早
いものからＮチャネルの音声のみをレベル変換せず音声
加算部６ａへ出力し、それ以外のチャネルからの音声は
無音化処理を行なった後、音声加算部６ａへ出力する。The first-arrival channel determination unit 5 which has received the voice / non-voice determination result from the voice detection unit 4 first determines the number of channels on which the voice portion of the voice is detected by N which is set in advance by the control unit 7a. (N is a positive integer)
In the following cases, voices from all terminals are added to the voice adding unit 6
output to a. When the number of detected sound channels exceeds N, the time at which the sound portion of the sound is recognized in each channel is determined in a time series, and only the sound of the N channels from the earliest one in the order in which the sound is generated is determined. Is output to the sound adding unit 6a without level conversion, and the sound from the other channels is subjected to a silencing process and then output to the sound adding unit 6a.

【００１８】音声加算部６ａは多地点会議に参加してい
るすべてのチャネルからの音声を常に加算処理している
が、音声検出部４にて先着チャネルＮ以外の音声は無音
化処理されているため、結果的に先着Ｎチャネルの加算
処理を行っていることになる。そして音声加算部６ａで
加算処理された各チャネルからの音声データは音声検出
部４へ送信され、無処理のまま音声符号化・復号部１ａ
〜１ｋへ出力し、音声符号化・復号部１ａ〜１ｋでは、
受信データと同一の符号化方式により符号化し、ネット
ワークを経由して各テレビ会議端末へ送信する。The voice adding unit 6a constantly adds voices from all the channels participating in the multipoint conference, but the voice detecting unit 4 silences voices other than that of the first channel N. As a result, the first N channels are added. The audio data from each channel added by the audio adding unit 6a is transmitted to the audio detecting unit 4, and the audio encoding / decoding unit 1a is not processed.
To 1k, and the audio encoding / decoding units 1a to 1k output
The data is encoded by the same encoding method as the received data, and transmitted to each video conference terminal via the network.

【００１９】図２の音声検出部４において、音声符号化
・復号部１ａ〜１ｋより入力された各テレビ会議端末か
らの受信音声データ１〜ｋは有音／無音判定部４１ａ〜
４１ｋにそれぞれ入力される。有音／無音判定部４１ａ
〜４１ｋでは一定時間ごとに音声の有音／無音を監視し
て、この監視時間が経過しても同一の状態が継続してい
るか否かにより音声の有音／無音の識別判定を行い、そ
の有音／無音判定結果１〜ｋを先着チャネル判定部５へ
送信する。また有音／無音判定部４１ａ〜４１ｋが受信
した受信音声データはそのままセレクタ４３ａ〜４３ｋ
へ送信される。In the voice detecting section 4 of FIG. 2, the received voice data 1-k from each video conference terminal input from the voice encoding / decoding sections 1a-1k are voice / non-voice determining sections 41a-41.
41k are respectively input. Sound / silence determination unit 41a
At ~ 41k, the sound / non-sound of the sound is monitored at regular intervals, and the sound / non-speech discrimination of the sound is determined based on whether or not the same state continues even after the monitoring time has elapsed. The sound / silence determination results 1 to k are transmitted to the first-arrival channel determination unit 5. The received voice data received by the sound / non-sound determining units 41a to 41k are directly used as selectors 43a to 43k.
Sent to

【００２０】セレクタ４３ａ〜４３ｋでは先着チャネル
判定部５から送られてくる先着チャネル判定結果１〜ｋ
に基づき、有音／無音判定部４１ａ〜４１ｋからの受信
した音声データか、無音レベル生成部４２ａ〜４２ｋか
らの無音データかのいずれかを選択し、音声加算部６ａ
へ出力する。In the selectors 43a to 43k, the first-arrival channel judgment results 1 to k sent from the first-arrival channel judgment unit 5
, And selects either the audio data received from the sound / non-speech determination units 41a to 41k or the non-speech data from the silence level generation units 42a to 42k.
Output to

【００２１】そして音声加算部６ａにて音声加算処理が
施された音声データは音声検出部４へ入力されるが、音
声検出部４では無処理のまま送信音声データ１〜ｋとし
て音声符号化・復号部１ａ〜１ｋへ出力される。The voice data subjected to the voice addition processing in the voice addition unit 6a is input to the voice detection unit 4, but the voice detection unit 4 performs voice coding / non-processing as transmission voice data 1 to k without any processing. Output to the decoding units 1a to 1k.

【００２２】このようにすることにより、音声検出部４
において先着チャネル判定部５にて非先着チャネルと判
定されたテレビ会議端末からの音声データを無音化する
ため、音声加算部６ａにて音声加算するチャネルを選択
する必要がなく、すべてのチャネルの音声を常に加算処
理すればよくなる。そのため処理が簡素化されプロセッ
サの処理能力に余裕が生まれるためチャネル数の拡張が
可能となる。By doing so, the voice detection unit 4
Since the audio data from the video conference terminal determined as the non-first-arrival channel by the first-arrival channel determination unit 5 is muted, it is not necessary to select a channel to be added by the audio addition unit 6a, and the audio of all the channels is not required. Should always be added. Therefore, the processing is simplified and the processing capacity of the processor has a margin, so that the number of channels can be expanded.

【００２３】実施の形態２．図３はこの発明による多地
点間通信制御装置の音声処理装置における音声検出部の
別の構成を示すブロック図である。上記の実施の形態と
同一もしくは相当する部分は同一符号で示し、説明は省
略する。４４ａ〜４４ｋは受信音声データを多地点会議
参加地点数すなわち参加テレビ会議端末数Ｍで割った値
まで音声レベルを下げる１／Ｍ利得制御部である。Embodiment 2 FIG. FIG. 3 is a block diagram showing another configuration of the voice detection unit in the voice processing device of the multipoint communication control device according to the present invention. Portions that are the same as or correspond to those in the above embodiment are denoted by the same reference numerals, and description thereof is omitted. Reference numerals 44a to 44k denote 1 / M gain controllers for lowering the audio level to a value obtained by dividing the received audio data by the number M of multipoint conference participation points, that is, the number M of participating video conference terminals.

【００２４】次に動作について説明する。音声符号化・
復号部１ａ〜１ｋより入力された各テレビ会議端末から
の受信音声データ１〜ｋは有音／無音判定部４１ａ〜４
１ｋに入力されるとともに１／Ｍ利得制御部４４ａ〜４
４ｋへも入力される。１／Ｍ利得制御部４４ａ〜４４ｋ
では予め制御部７ａより多地点会議の参加地点数すなわ
ち参加テレビ会議端末数の情報を入手しており、受信音
声データを１／Ｍの音声レベルに下げた音声データをセ
レクタ４３ａ〜４３ｋへ出力している。Next, the operation will be described. Voice coding
The audio data 1 to k received from the video conference terminals input from the decoding units 1 a to 1 k are used as sound / non-speech determining units 41 a to 41.
1k and 1 / M gain control units 44a-4
4k is also input. 1 / M gain control units 44a to 44k
In advance, information on the number of participating points in a multipoint conference, that is, the number of participating video conference terminals, is obtained from the control unit 7a, and audio data obtained by reducing received audio data to an audio level of 1 / M is output to the selectors 43a to 43k. ing.

【００２５】有音／無音判定部４１ａ〜４１ｋでは一定
時間ごとに音声の有音／無音を監視して、この監視時間
が経過しても同一の状態が継続しているか否かにより音
声の有音／無音の識別判定を行い、その有音／無音判定
結果１〜ｋを先着チャネル判定部５へ送信する。有音／
無音判定部４１ａ〜４１ｋが受信した受信音声データは
そのままセレクタ４３ａ〜４３ｋへ送られる。The sound / non-speech judging sections 41a to 41k monitor the sound / non-speech of the sound at regular intervals, and determine whether the same state continues even if the monitoring time has elapsed. A sound / silence discrimination determination is made, and the sound / silence judgment results 1 to k are transmitted to the first-arrival channel judgment unit 5. Sound /
The received voice data received by the silence determination units 41a to 41k are sent to the selectors 43a to 43k as they are.

【００２６】セレクタ４３ａ〜４３ｋでは先着チャネル
判定部５から送られてくる先着チャネル判定結果１〜ｋ
に基づき、有音／無音判定部４１ａ〜４１ｋからの受信
データか、１／Ｍ利得制御部４４ａ〜４４ｋからの１／
Ｍレベル音声データかのいずれかを選択し、音声加算部
６ａへ出力する。In the selectors 43a to 43k, the first-arrival channel judgment results 1 to k sent from the first-arrival channel judgment unit 5
Based on the received data from the sound / non-sound determining units 41a to 41k or 1 / M from the 1 / M gain control units 44a to 44k.
One of the M-level audio data is selected and output to the audio adding unit 6a.

【００２７】上述の実施の形態１では、どのテレビ会議
端末も発言していない場合、無音レベル生成部４２ａ〜
４２ｋの出力が音声加算部６ａへ送信されるため、完全
な無音データが音声加算部６ａより各端末へ送信される
ことになる。各テレビ会議端末から送られてくる音声デ
ータには発言者の音声データの他に、本来不要であるは
ずの背景雑音というレベルの低い音声データが含まれて
いる。In the first embodiment, when no video conference terminal speaks, the silent level generation units 42a to 42a
Since the output of 42k is transmitted to the voice adding unit 6a, complete silent data is transmitted from the voice adding unit 6a to each terminal. The audio data transmitted from each video conference terminal includes, in addition to the audio data of the speaker, audio data having a low level of background noise which should be unnecessary.

【００２８】しかし実際にテレビ会議を行っている場
合、端末より何の音声も出力されない完全な無音状態は
利用者に不自然さを与えてしまうことが多く、ある程度
の背景雑音を残すほうが違和感がなく多地点会議を行う
ことができる。However, when actually conducting a video conference, a complete silence state in which no sound is output from the terminal often causes unnaturalness to the user, and it is more uncomfortable to leave some background noise. Multi-point conferences can be held without any problems.

【００２９】そこで各端末からの背景雑音を無処理のま
まセレクタ４３ａ〜４３ｋへ出力すると、背景雑音が会
議参加地点数Ｍの分だけ加算されてしまい、背景雑音レ
ベルが上がってしまうため多地点会議に支障が出る。こ
の問題を解決するために、背景雑音は１／Ｍ利得制御部
４４ａ〜４４ｋにて会議参加地点数Ｍで割った値までレ
ベルを下げて音声加算部６ａへ出力することにより、加
算された背景雑音レベルは１端末分のレベルと同一にな
り、違和感のない多地点会議を行うことができる。If the background noise from each terminal is output to the selectors 43a to 43k without any processing, the background noise is added by the number of conference participation points M, and the background noise level increases. Trouble. In order to solve this problem, the level of the background noise is reduced by the 1 / M gain control units 44a to 44k to a value obtained by dividing by the number M of conference participation points and output to the voice addition unit 6a, thereby adding the background noise. The noise level is the same as the level for one terminal, and a multipoint conference without discomfort can be held.

【００３０】実施の形態３．図４はこの発明による多地
点間通信制御装置の音声処理装置における音声検出部の
さらに別の構成を示すブロック図である。上記の実施の
形態と同一もしくは相当する部分は同一符号で示し、説
明は省略する。４５ａ〜４５ｋは、受信音声データを多
地点会議参加地点数Ｍで割った値までフェードアウトし
ながら音声レベルを下げるフェードアウト機能付１／Ｍ
利得制御部である。Embodiment 3 FIG. 4 is a block diagram showing still another configuration of the voice detection unit in the voice processing device of the multipoint communication control device according to the present invention. Portions that are the same as or correspond to those in the above embodiment are denoted by the same reference numerals, and description thereof is omitted. 45a to 45k include a fade-out function 1 / M with a fade-out function for lowering the voice level while fading out the received voice data to a value obtained by dividing the received voice data by the number M of multipoint conference participation points.
It is a gain control unit.

【００３１】次に動作について説明する。音声符号化・
復号部１ａ〜１ｋより入力された各テレビ会議端末から
の受信音声データ１〜ｋは有音／無音判定部４１ａ〜４
１ｋに入力されるとともにフェードアウト機能付１／Ｍ
利得制御部４５ａ〜４５ｋへも入力される。フェードア
ウト機能付１／Ｍ利得制御部４５ａ〜４５ｋでは、予め
制御部７ａより多地点会議の参加地点数すなわち参加テ
レビ会議端末数の情報を入手しており、受信音声データ
をフェードアウトさせながら１／Ｍの音声レベルに下げ
た音声データをセレクタ４３ａ〜４３ｋへ出力してい
る。Next, the operation will be described. Voice coding
The audio data 1 to k received from the video conference terminals input from the decoding units 1 a to 1 k are used as sound / non-speech determining units 41 a to 41.
1M input with 1k and fade-out function
The signals are also input to the gain controllers 45a to 45k. In the 1 / M gain control units 45a to 45k with a fade-out function, information on the number of participating points of the multipoint conference, that is, the number of participating video conference terminals is obtained in advance from the control unit 7a, and the 1 / M gain control unit fades out the received voice data. Is output to the selectors 43a to 43k.

【００３２】有音／無音判定部４１ａ〜４１ｋでは一定
時間ごとに音声の有音／無音を監視して、この監視時間
が経過しても同一の状態が継続しているか否かにより音
声の有音／無音の判定を行い、その有音／無音判定結果
１〜を先着チャネル判定部５へ送信する。有音／無音判
定部４１ａ〜４１ｋが受信した受信音声データはそのま
まセレクタ４３ａ〜４３ｋへ送信される。The sound / non-speech judging sections 41a to 41k monitor the sound / non-speech of the voice at regular intervals, and determine whether the same state continues even after the monitoring time has elapsed. The sound / silence determination is performed, and the sound / silence determination results 1 to 1 are transmitted to the first-arrival channel determination unit 5. The received voice data received by the sound / non-sound determining units 41a to 41k is transmitted to the selectors 43a to 43k as they are.

【００３３】セレクタ４３ａ〜４３ｋでは先着チャネル
判定部５から送られてくる先着チャネル判定結果１〜ｋ
に基づき、有音／無音判定部４１ａ〜４１ｋからの受信
データか、フェードアウト機能付１／Ｍ利得制御部４５
ａ〜４５ｋからの１／Ｍレベルの音声データかのいずれ
かを選択し、音声加算部６ａへ出力する。In the selectors 43a to 43k, the first-arrival channel judgment results 1 to k sent from the first-arrival channel judgment unit 5
Based on the received data from the sound / non-sound determining units 41a to 41k or the 1 / M gain control unit 45 with a fade-out function.
Any one of 1 / M level audio data from a to 45k is selected and output to the audio adding unit 6a.

【００３４】上述の実施の形態２では先着チャネルとし
て判定され、発言していた端末が発言をやめると、先着
チャネル判定結果が非先着となり音声検出部４のセレク
タ４３ａ〜４３ｋに通知されるため、有音／無音判定部
４１ａ〜４１ｋの出力データから１／Ｍ利得制御部４４
ａ〜４４ｋの出力データへ急に切り替わってしまう。こ
のため発言者の音声の語尾が途切れたり、発言が終わっ
た途端、急に静かになるなどの違和感を与えてしまうこ
とがある。In the above-described second embodiment, the channel is determined as the first-arrival channel, and when the terminal that has made the speech stops speaking, the first-arrival channel determination result becomes non-first-arrival and is notified to the selectors 43a to 43k of the voice detection unit 4. 1 / M gain control unit 44 from output data of sound / non-sound determination units 41a to 41k
The output data is suddenly switched to the output data of a to 44k. For this reason, there may be a sense of incongruity such as the end of the voice of the speaker being interrupted, or as soon as the utterance ends, the speaker suddenly becomes quiet.

【００３５】そこでフェードアウト機能付１／Ｍ利得制
御部４５ａ〜４５ｋでは先着チャネル判定部５から送ら
れてくる先着チャネル判定結果１〜ｋを使用し、先着中
は有音／無音判定部４１ａ〜４１ｋから受信音声データ
を利得制御せずセレクタ４３ａ〜４３ｋへ出力する。そ
の後、発言が終わり非先着と判定された時点から受信音
声データの利得を緩やかに下げていき、最終的に１／Ｍ
の利得まで下げる。このような処理を行うことにより発
言者の音声の語尾を消失することなく違和感のない音声
加算処理を実現する。Therefore, the 1 / M gain control units 45a to 45k with fade-out function use the first channel determination results 1 to k sent from the first channel determination unit 5, and the sound / non-sound determination units 41a to 41k during the first arrival. And outputs the received voice data to the selectors 43a to 43k without performing gain control. Thereafter, the gain of the received voice data is gradually reduced from the point in time when the remark is finished and it is determined that the voice is non-first-come-first-served, and finally 1 / M
Lower the gain. By performing such processing, it is possible to realize a sound addition processing without discomfort without ending the end of the voice of the speaker.

【００３６】[0036]

【発明の効果】以上のようにこの発明によれば、３地点
以上のテレビ会議端末を用いて多地点テレビ会議を実現
する多地点間通信制御装置の音声処理装置であって、上
記各テレビ会議端末からの上記音声データを受信するチ
ャネル毎に上記音声データをディジタルデータに復号
し、また音声処理された最終的な音声データを受信した
データと同じ符号化方式により符号化して各テレビ会議
端末へそれぞれ送る音声符号化・復号手段と、この音声
符号化・復号手段からの復号された各チャネルの音声デ
ータをそれぞれ一定時間毎に音声の有音部分か無音部分
かの識別判定を行うと共に、各チャネル毎に上記音声符
号化・復号手段からの復号された音声データおよび上記
復号されたデータが無音部分である時に相当するデータ
のいずれかを発生する音声検出手段と、この音声検出手
段での識別判定結果に基づいて、有音部分が検出された
チャネル数がＮ個以下(但しＮは正の整数)の場合は全て
のチャネルに関して上記音声符号化・復号手段からの復
号された音声データを発生させ、上記有音部分が検出さ
れたチャネル数がＮ個を越える場合には有音部分が認識
された順序に従い早いものからＮ個のチャネルに関して
は上記復号された音声データを発生させ、残りのチャネ
ルに関しては上記無音部分に相当するデータを発生させ
るように上記音声検出手段を制御する先着チャネル判定
手段と、上記音声検出手段から発生される音声データお
よび無音部分に相当するデータを全て加算し、この音声
処理された音声データを上記音声符号化・復号手段から
上記各テレビ会議端末に送る音声加算手段と、を備えた
ので、音声加算は常にすべてのテレビ会議端末からの音
声を加算すればよく、処理が簡素化され、このため音声
加算に使用するプロセッサ等の処理能力に余裕が生まれ
るため加算処理可能なチャネル数が拡張され、ひいては
より多くのテレビ会議端末の音声を処理可能な多地点間
通信制御装置の音声処理装置を提供できる等の効果が得
られる。As described above, according to the present invention, there is provided an audio processing apparatus of a multipoint communication control apparatus for realizing a multipoint videoconference using videoconference terminals at three or more points. The audio data is decoded into digital data for each channel for receiving the audio data from the terminal, and the final audio data subjected to the audio processing is encoded by the same encoding method as the received data and transmitted to each video conference terminal. Each of the voice encoding / decoding means to be transmitted, and the audio data of each channel decoded from the voice encoding / decoding means, each of which is discriminated between a voiced part and a non-voiced part at regular intervals, and For each channel, one of the decoded audio data from the audio encoding / decoding means and the data corresponding to the time when the decoded data is a silent part is generated. If the number of channels in which a sound part is detected is N or less (where N is a positive integer) based on the voice detection means and the identification determination result by the voice detection means, the voice coding is performed for all channels. Generating the decoded audio data from the decoding means, and when the number of channels in which the sound portion is detected exceeds N, for the N channels from the earliest in the order in which the sound portion is recognized, First-arrival channel determination means for controlling the voice detection means so as to generate the decoded voice data and to generate data corresponding to the silent portion for the remaining channels; and voice data generated from the voice detection means. And all the data corresponding to the silence portion, and adds the voice-processed voice data from the voice encoding / decoding means to each of the video conference terminals. With the addition of a stage, audio addition can be performed by always adding audio from all the video conference terminals, simplifying the processing, and creating a margin in the processing capability of the processor and the like used for audio addition. The number of channels that can be processed is expanded, and the effect of being able to provide a voice processing device of a multipoint communication control device that can process more voices of the video conference terminal is obtained.

【００３７】さらに上記音声検出手段において、上記復
号されたデータが無音部分である時に相当するデータ
を、無音レベルのデータとしたので、容易な構成でより
多くのテレビ会議端末の音声を処理可能な多地点間通信
制御装置の音声処理装置を提供できる等の効果が得られ
る。Further, in the voice detecting means, the data corresponding to the time when the decoded data is a silent part is data of a silent level, so that it is possible to process more voices of the TV conference terminal with a simple configuration. Advantages such as providing a voice processing device for a multipoint communication control device can be obtained.

【００３８】さらに上記音声検出手段において、上記復
号されたデータが無音部分である時に相当するデータ
を、上記音声符号化・復号手段からの上記復号した音声
データを１／Ｍ(但しＭは参加テレビ会議端末数)の音声
レベルに下げたデータとしたので、端末より何の音声も
出力されない完全な無音状態がなく、ある程度の背景雑
音を残した使用者に違和感を与えない多地点間通信制御
装置の音声処理装置を提供できる等の効果が得られる。Further, in the voice detecting means, the data corresponding to the time when the decoded data is a silent part is replaced with the decoded voice data from the voice coding / decoding means by 1 / M (where M is the participating television). Since the data is reduced to the audio level of (the number of conference terminals), there is no completely silent state in which no audio is output from the terminal, and a multipoint communication control device that does not give a feeling of strangeness to the user who leaves some background noise And the like can be provided.

【００３９】さらに上記音声検出手段において、上記復
号されたデータが無音部分である時に相当するデータ
を、上記音声符号化・復号手段からの上記復号した音声
データが１／Ｍ(但しＭは参加テレビ会議端末数)の音声
レベルまで緩やかに下がるデータとしたので、端末より
何の音声も出力されない完全な無音状態がなく、ある程
度の背景雑音を残すようにすると共に、さらに発言者の
音声の語尾が途切れたり、発言が終わった途端、急に静
かになるなどの違和感も使用者に与えない多地点間通信
制御装置の音声処理装置を提供できる等の効果が得られ
る。Further, in the voice detecting means, the data corresponding to the time when the decoded data is a silent part is converted by the decoded voice data from the voice encoding / decoding means into 1 / M (where M is (No. of conference terminals) The data is gently lowered to the audio level, so there is no complete silence where no audio is output from the terminal, leaving some background noise, and the ending of the voice of the speaker As a result, it is possible to provide an audio processing device of a multipoint communication control device that does not give the user a sense of incongruity such as a break or a sudden stop of speech as soon as the speech ends.

[Brief description of the drawings]

【図１】この発明の一実施の形態による多地点間通信
制御装置の音声処理装置の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing a configuration of a voice processing device of a multipoint communication control device according to an embodiment of the present invention.

【図２】図１の音声検出部の一例を示すブロック図で
ある。FIG. 2 is a block diagram illustrating an example of a voice detection unit in FIG. 1;

【図３】図１の音声検出部の他の例を示すブロック図
である。FIG. 3 is a block diagram illustrating another example of the voice detection unit in FIG. 1;

【図４】図１の音声検出部さらに別の例を示すブロッ
ク図である。FIG. 4 is a block diagram showing still another example of the voice detection unit in FIG. 1;

【図５】従来の多地点間通信制御装置の音声処理装置
のブロック図である。FIG. 5 is a block diagram of a conventional voice processing device of the multipoint communication control device.

[Explanation of symbols]

１ａ〜１ｋ音声符号化・復号部、４音声検出部、５
先着チャネル判定部、６ａ音声加算部、７ａ制御
部、４１ａ〜４１ｋ有声／無音判定部、４２ａ〜４２
ｋ無音レベル生成部、４３ａ〜４３ｋセレクタ、４
４ａ〜４４ｋ１／Ｍ利得制御部、４５ａ〜４５ｋフェ
ードアウト機能付１／Ｍ利得制御部である。1a to 1k voice encoding / decoding section, 4 voice detection section, 5
First-arrival channel determination unit, 6a Voice addition unit, 7a control unit, 41a to 41k Voiced / silence determination unit, 42a to 42
k silent level generator, 43a to 43k selector, 4
4a to 44k1 / M gain control unit; 45a to 45k 1 / M gain control unit with fade-out function.

Claims

[Claims]

An audio processing device of a multipoint communication control device for realizing a multipoint videoconference using videoconference terminals at three or more points, wherein the channel receives the audio data from each of the videoconference terminals. Audio encoding / decoding means for decoding the audio data into digital data each time, and encoding the final audio data subjected to the audio processing by the same encoding method as the received data and sending the encoded data to each video conference terminal, The audio data of each channel decoded from the audio encoding / decoding means is discriminated at a predetermined time interval between a sound part and a silent part of the audio, and the audio encoding / decoding means is determined for each channel. Voice detection means for generating either of the decoded voice data from the device and data corresponding to the time when the decoded data is a silent portion; If the number of channels in which a sound part is detected is N or less (where N is a positive integer) based on the result of the identification determination by the means,
When the number of channels in which the sound part is detected exceeds N, the decoded sound data from the decoding means is generated. First-channel determining means for controlling the voice detecting means so as to generate decoded voice data, and for the remaining channels, generating data corresponding to the silent portion; voice data generated from the voice detecting means; Multi-point adding means for adding all data corresponding to a silent portion and transmitting the audio-processed audio data from the audio encoding / decoding means to each of the video conference terminals. Voice processing unit for communication control unit.

2. The multipoint communication control apparatus according to claim 1, wherein, in the voice detecting means, data corresponding to a time when the decoded data is a silent part is data of a silent level. Audio processing device.

3. In the audio detecting means, the data corresponding to the time when the decoded data is a silent part is 1 / M (where M is a participant) the decoded audio data from the audio encoding / decoding means. 2. The voice processing device of the multipoint communication control device according to claim 1, wherein the data is data reduced to a voice level of the number of video conference terminals.

4. In the voice detecting means, the data corresponding to the time when the decoded data is a silent part is 1 / M (where M is a participant) from the voice encoding / decoding means. 2. The voice processing device for a multipoint communication control device according to claim 1, wherein the data is data that gradually decreases to the voice level of the number of video conference terminals.