JPH0974446A

JPH0974446A - Voice communication controller

Info

Publication number: JPH0974446A
Application number: JP8042549A
Authority: JP
Inventors: Ikuichirou Kinoshita; 郁一郎木下; Shigeaki Aoki; 茂明青木; Manabu Okamoto; 学岡本; Nobuo Hayashi; 伸夫林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-03-01
Filing date: 1996-02-29
Publication date: 1997-03-18

Abstract

PROBLEM TO BE SOLVED: To enhance the articulation of a transmission content by using a sound image control parameter in a multi-point conference system so as to conduct processing, thereby giving a sound image different from other voice to as least are person's voice. SOLUTION: An exchange section 11 selects communication line from a terminal equipment making a participant request to a conference and connects to a voice addition control section 10 via an input channel. A sound image control section 14 in the voice addition control section 10 applies processing to a distributed voice signal derived from each terminal equipment through the use of one or desired combinations of sound image control parameters. The processing provides the sound image of an object space position different from each terminal equipment to the distributed voice signal. An adder section 15 adds the distributed voice signals and a terminal dependent distribute section 16 distributes the sum signal to all terminal equipments.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、通信回線を介し
て行われる音声会議、テレビ会議、マルチメディア会議
など、音声通信を伴う多地点通信会議における音声信号
加算を制御する音声通信制御装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice communication control apparatus for controlling voice signal addition in a multipoint communication conference involving voice communication such as a voice conference, a video conference and a multimedia conference conducted via a communication line.

【０００２】[0002]

【従来の技術】音声会議装置、多地点テレビ会議装置等
において、各会議参加者による音声信号に同時話者数な
どに応じた重み係数を掛け合わせた上で加算し、各会議
参加者に加算された音声信号を返送する音声通信制御装
置が使用される。従来から用いられている音声通信制御
装置としては会議参加者全員の音声信号を加算する手段
を有するもの、発言要求に応じて音声信号の送信を許可
する手段を有するものなどが挙げられる。2. Description of the Related Art In a voice conference device, a multi-point video conference device, etc., a voice signal from each conference participant is multiplied by a weighting coefficient according to the number of simultaneous speakers, and the result is added to each conference participant. A voice communication control device for returning the generated voice signal is used. Conventionally used voice communication control devices include those having means for adding voice signals of all conference participants, those having means for permitting transmission of voice signals in response to a speech request, and the like.

【０００３】しかしいずれの方式の音声通信制御装置に
おいても選択及び加算された音声信号を、各端末に送信
する音声信号のチャネルは１チャネルだけに限定されて
いる。複数人分の音声信号を加算する方式では、複数の
会議参加者が同時に発言した場合に、複数人の音声が混
合されて１つの音源（スピーカ）から再生される。その
ため、受聴者にとって発言内容の了解性が低下する。そ
れと同時に発言者を固定することが困難となる。更に、
参加者にとって発言要求のための操作が必要となる。そ
のため自由な会話が難しい、という問題点が有る。However, in any type of voice communication control device, the channel of the voice signal for transmitting the selected and added voice signal to each terminal is limited to only one channel. In the method of adding the voice signals of a plurality of people, when a plurality of conference participants speak at the same time, the voices of a plurality of people are mixed and reproduced from one sound source (speaker). As a result, the listener's intelligibility of the content of the statement is reduced. At the same time, it becomes difficult to fix the speaker. Furthermore,
It is necessary for the participants to perform an operation for requesting to speak. Therefore, there is a problem that free conversation is difficult.

【０００４】一方、受話者が各送話者の音声に対する音
像を各々異なる空間位置に定位したとき、各送話者の同
定が容易となり発話内容の了解度が向上することが知ら
れている(D.R.Begault,“Multichannel Spatial Audito
ry display for Speech Communication,”Journal of t
he Audio Engineering Society,42,pp.819-826,1994)。
ただし、音像定位とは受聴した音の位置を判断すること
である。通常、音源の位置と音像が定位される位置は一
致する。しかし受聴者に目的の空間位置に音像定位させ
る技術が考案されている。ここで、各々異なる仮想空間
位置に複数の音を定位させるための音像定位処理方法の
代表例として、次の方法が知られている。図２のように
音声信号Ｓ₁に図１における音源１から左右各耳までの
音響伝達関数Ｈ_1L，Ｈ_1Rをそれぞれ畳み込む。同時に、
音声信号Ｓ₁とは異なる音声信号Ｓ₂に音源２から左右
各耳までの音響伝達関数Ｈ_2L，Ｈ_2Rをそれぞれ畳み込み
演算する。畳み込み演算によって得られた音声信号を加
算し、ステレオヘッドホン等を用いて両耳に加算された
音声信号による音を提示する。それによって図２のよう
に前記実音源１と２から両耳に到達した場合と同等な音
刺激S₁ ^*H_1L＋S₂ ^*H_2L，S₁ ^*H_1R＋S₂ ^*H_2Rがそれぞれ左右各
耳において得られる。その場合受聴者は図１における音
源１，２と同じ空間位置に、各音源信号Ｓ₁，Ｓ₂に対
する音を定位することが可能になる。その他の方法につ
いても、ブラウエルト、後藤、森本、編著「空間音響」
（鹿島出版会）等の著書に詳しい。On the other hand, it is known that when the listener locates the sound image for the voice of each speaker at different spatial positions, the identification of each speaker becomes easy and the intelligibility of the uttered content is improved ( DRBegault, “Multichannel Spatial Audito
ry display for Speech Communication, ”Journal of t
he Audio Engineering Society, 42, pp. 819-826, 1994).
However, sound image localization is to determine the position of the sound that is received. Usually, the position of the sound source and the position where the sound image is localized coincide. However, a technique has been devised that allows a listener to localize a sound image at a desired spatial position. Here, the following method is known as a representative example of a sound image localization processing method for localizing a plurality of sounds in different virtual space positions. Convoluted with the audio signals S ₁ as shown in FIG. 2 the acoustic transfer functions H _1L from the sound source 1 in FIG. 1 to the right and left ears, the H _1R respectively. at the same time,
The acoustic transfer functions H _2L and H _2R from the sound source 2 to each of the left and right ears are convoluted with the audio signal S ₂ different from the audio signal S ₁ , respectively. The audio signals obtained by the convolution calculation are added, and the sound by the added audio signals is presented to both ears using stereo headphones or the like. As a result, sound stimuli S ₁ ^* H _1L ＋ S ₂ ^* H _2L and S ₁ ^* H _1R ＋ S ₂ ^* H _2R, which are equivalent to those when the sound sources 1 and 2 reach both ears as shown in FIG. Obtained in the ear. In that case, the listener can localize the sound for each of the sound source signals S ₁ and S _{2 at the} same spatial position as the sound sources 1 and 2 in FIG. Other methods are also described in "Spatial Acoustics" by Brauert, Goto, Morimoto.
He is familiar with books such as (Kashima Press).

【０００５】上記の知見を多地点間音声通信に応用した
従来技術の一例として、特開平４−１０７４４号等に記
載された会議通話端末装置が挙げられる。図３に概略図
で示すように、各地点から受信した音声信号からの再生
音をそれぞれ異なる目的空間位置に音像定位させるよう
な音声信号処理を行う手段３Ｌ，３Ｒを具備した通信端
末装置が提案された。この装置を動作させるためには、
各地点の端末（通話者）毎に番号（又は端末番号）ＩＤ
を予め付与すること、音声信号と同時にこの地点番号Ｉ
Ｄを送話元制御情報として伝送することが不可欠であ
る。地点番号ＩＤを付与する目的は、発信元となる地点
を特定し、地点毎に音声信号を分離することにある。す
なわち、他の通信端末装置からの受信信号は信号分離部
で音声信号とＩＤコードに分離され、分離されたＩＤコ
ードに従って切り替え制御部２でそのＩＤの端末に割り
当てた空間位置に対応する対の音響伝達関数の畳込みを
行う音声信号処理手段３Ｒ，３Ｌを選出し、その選択し
た処理手段と信号分離部１から分離された音声信号をそ
れぞれ供給する。その結果、再生される音像は割り当て
られた空間位置に定位される。この装置を利用するため
に、各地点間において当該装置を導入し、通信方式を予
め定めておかなければならない欠点がある。以上の欠点
は任意地点間において送話者の音声を各々異なる仮想空
間位置に定位させる多地点間音声通信の経済的な実現を
阻む。As an example of the prior art in which the above knowledge is applied to multipoint voice communication, there is a conference call terminal device described in JP-A-4-10744. As shown in the schematic diagram of FIG. 3, a communication terminal device provided with means 3L, 3R for performing audio signal processing for localizing reproduced sound from audio signals received from each point to different target space positions is proposed. Was done. In order to operate this device,
Number (or terminal number) ID for each terminal (caller) at each point
In advance, the point number I
It is essential to transmit D as sender control information. The purpose of assigning the point number ID is to identify the source point and separate the audio signal for each point. That is, a signal received from another communication terminal device is separated into a voice signal and an ID code by a signal separation unit, and a pair of pairs corresponding to the spatial position assigned to the terminal of that ID by the switching control unit 2 is separated according to the separated ID code. The audio signal processing means 3R and 3L for convoluting the acoustic transfer function are selected, and the selected processing means and the audio signal separated from the signal separation unit 1 are supplied. As a result, the reproduced sound image is localized at the assigned spatial position. In order to use this device, there is a drawback that the device must be installed between each point and the communication system must be predetermined. The above drawbacks impede economical realization of multipoint voice communication in which the voice of the talker is localized in different virtual space positions between arbitrary points.

【０００６】２地点間に限定した通信会議では、ステレ
オマイクで会議室の音場を収録しステレオコーデック等
により相手地点に自地点の音響環境を伝送する装置も提
案されている（例えばU.S.Patent 5,020,098）。しか
し、３地点以上を結ぶ多地点通信への適用にあたって、
各地点相互間において回線を接続しなければならない。
図４のように各地点毎に音像定位信号処理手段４Ａを具
備した端末４を設け、各端末４を各地点毎に通信回線で
接続する必要がある。このとき、必要な回線数Ｃ_Mは最
低でも接続端末数Ｍの組合せ数Ｍ×（Ｍ−１）／２とな
る。この接続方式では、接続端末数Ｍの増加に伴い必要
な回線数が膨大となるだけでなく、予め接続された端末
間以外の通信が不可能なため非現実的である。In a communication conference limited to two points, a device has been proposed in which the sound field of the conference room is recorded by a stereo microphone and the acoustic environment of the own point is transmitted to the other point by a stereo codec or the like (for example, USPatent 5,020,098). . However, in applying to multipoint communication connecting three or more points,
Lines must be connected between each point.
As shown in FIG. 4, it is necessary to provide a terminal 4 equipped with the sound image localization signal processing means 4A at each point and connect each terminal 4 to each point by a communication line. At this time, the required number of lines C _M is at least the number of combinations M of connected terminals M M × (M−1) / 2. This connection method is unrealistic because not only the number of lines required as the number of connected terminals M increases, but also communication between terminals connected in advance becomes impossible.

【０００７】上述と同様な音像定位技術を利用して、受
話者が各送話者の音声を各々異なる空間位置に定位する
多地点間音声通信を実現する方法は他にも例えば、文献
（Cohen M.,Koizumi N.and Aoki S."Design and Contro
l of Shared Conferencing Environments for Audio Te
lecommunication," Proc. Int. Symp. on Measurement
and Control in Robotics, pp.405-412, Nov. 1992等）
に挙げられるように多地点間において行われる通信にお
いて、前記地点のうち任意の２グループ以上の地点組み
合わせ間における通信を行う方法が提案されている。Other methods for realizing multipoint voice communication in which the listener localizes the voices of the respective speakers to different spatial positions by using the same sound image localization technology as described above, for example, in the literature (Cohen). M., Koizumi N. and Aoki S. "Design and Contro
l of Shared Conferencing Environments for Audio Te
lecommunication, "Proc. Int. Symp. on Measurement
and Control in Robotics, pp.405-412, Nov. 1992)
In the communication performed between multiple points as described in (1), there has been proposed a method of performing communication between arbitrary two or more groups of points among the points.

【０００８】この方法でも各地点において各送話者の音
声を各々異なる位置に定位させるように各地点から通信
回線を通じて伝送された音声信号を処理する音声信号処
理手段と、この音声信号処理手段により生成された音声
信号を加算する加算手段を各端末毎に設置することが前
提となる。更に、発信元となる地点が特定され、各地点
毎に音声信号が伝送されることが必要である。そのため
に、通信方式を予め定めておかなければならない欠点が
残される。Also in this method, the voice signal processing means for processing the voice signal transmitted from each point through the communication line so as to localize the voice of each talker at each point at each point, and this voice signal processing means It is premised that an adding means for adding the generated audio signals is installed in each terminal. Furthermore, it is necessary that the point of origin is specified and that a voice signal be transmitted at each point. Therefore, there is a drawback that the communication method must be predetermined.

【０００９】[0009]

【発明が解決しようとする課題】この発明の目的は、多
地点通信会議において各端末に大きな音声信号処理能力
を持たせることなく、複数の話者が同時に発言したとき
に生じる発言内容の了解性に優れた、多地点通信会議を
実現する音声通信制御装置を提供することにある。DISCLOSURE OF THE INVENTION An object of the present invention is to understand the utterance content that occurs when a plurality of speakers speak at the same time without giving each terminal a large voice signal processing capability in a multipoint communication conference. Another object of the present invention is to provide a voice communication control device that realizes an excellent multipoint communication conference.

【００１０】この発明のもう１つの目的は、通信網に収
容されている任意の端末が利用可能な音声通信制御装置
を提供することにある。この発明の更にもう１つの目的
は、各受話者が複数の送話者による音声を各々異なった
位置に定位する２つ以上の接続端末組み合わせ間におけ
る通信を同時に実現する音声通信制御装置を提供するこ
とにある。Another object of the present invention is to provide a voice communication control device which can be used by any terminal accommodated in a communication network. Still another object of the present invention is to provide a voice communication control device for simultaneously realizing communication between a combination of two or more connected terminals in which each listener localizes voices by a plurality of transmitters at different positions. Especially.

【００１１】[0011]

【課題を解決するための手段】この発明は、少なくとも
３台の端末と通信回線を通して接続され、通信会議を行
うための音声通信制御装置であり、上記端末と通信回線
を介して接続するための交換部と、上記交換部に接続さ
れ、上記端末からの音声信号が入力される複数の入力チ
ャネルと、上記複数の入力チャネルからの上記入力音声
信号をそれぞれ予め決めた複数のチャネルの分岐音声信
号に分岐するチャネル分岐部と、各上記端末に対応する
上記複数のチャネルの分岐音声信号に対し、予め決めた
音像制御パラメータにより音像制御処理を行って音像制
御音声信号を生成する音像制御部と、それぞれの上記端
末に対応する上記複数のチャネルの上記音像制御音声信
号を各チャネル毎に加算して上記複数のチャネルの加算
音声信号を生成する加算部と、上記複数のチャネルの上
記加算音声信号を上記端末にそれぞれ対応して分配し、
上記交換部に与える端末対応分岐部、とを含み、上記音
像制御部は少なくとも１台の上記端末に対応する上記複
数のチャネルの分岐音声信号に対し、他の各端末に対応
する上記複数のチャネル音声信号と異なる音像を与える
処理を行う。SUMMARY OF THE INVENTION The present invention is a voice communication control device for connecting to at least three terminals through communication lines to hold a communication conference, and for connecting to the above terminals through communication lines. An exchange unit, a plurality of input channels connected to the exchange unit, to which voice signals from the terminal are input, and a branch voice signal of a plurality of channels, each of which has predetermined input voice signals from the plurality of input channels. A channel branching unit for branching to, a branching sound signal of the plurality of channels corresponding to each of the terminals, a sound image control unit that performs a sound image control process by a predetermined sound image control parameter, and generates a sound image control sound signal, The sound image control audio signals of the plurality of channels corresponding to the respective terminals are added for each channel to generate an added audio signal of the plurality of channels. An adder, the sum audio signals of the plurality of channels distributed in correspondence to the terminal,
A branch unit corresponding to a terminal for giving to the exchange unit, and the sound image control unit, for branch audio signals of the plurality of channels corresponding to at least one terminal, the plurality of channels corresponding to other terminals. A process of giving a sound image different from the sound signal is performed.

【００１２】[0012]

【発明の実施の形態】図５はこの発明による音声通信制
御装置を使用した多地点通信会議システムの概要を示
す。この発明の音声通信制御装置１００は交換部１１を
有している。この回線変換部は例えばＩＳＤＮやＬＡＮ
等のような通信網に接続されており、接続されたどの地
点の端末からでも利用可能である。音声通信制御装置１
００の規模及び処理能力の制限から同時に会議に参加可
能な最大参加者数（端末数）Ｎは予め決められている。
Ｎは３以上の整数である。会議参加端末TM-1〜TM-4がこ
の交換部１１によりＮ本の入力チャネルＣ₁ 〜Ｃ_N に選
択接続される。選択接続された端末は入力チャネルＣ₁
〜Ｃ_N を通して音声加算制御部１０に接続され、各端末
間において互いに会話可能な多地点通信会議システムが
構成される。音声加算制御部１０は後で詳細に説明する
ように、各端末に由来する分岐音声信号に対し音量（減
衰率）、遅延時間、位相、伝達関数（これらを音像制御
パラメータと呼ぶことにする）のいずれか１個又は所望
の組合せを用いて処理を施す。この処理によって各端末
TMにおいて、同時に再生される複数の参加者の音声のう
ち少なくとも１人の音声が残りの音声に対し異なった音
像を与えるようにする。FIG. 5 shows an outline of a multipoint communication conference system using a voice communication control device according to the present invention. The voice communication control device 100 of the present invention has an exchange section 11. This line conversion unit is, for example, ISDN or LAN.
Etc., and can be used from terminals at any connected points. Voice communication control device 1
The maximum number of participants (the number of terminals) N that can participate in the conference at the same time is determined in advance due to the limitation of the size of 00 and the processing capacity.
N is an integer of 3 or more. The conference participation terminals TM-1 to TM-4 are selectively connected to the _N input channels C _{1 to} C _N by the exchange unit 11. The selectively connected terminal has an input channel C ₁
To C _N are connected to the voice addition control unit 10 to form a multipoint communication conference system in which terminals can talk with each other. As will be described in detail later, the voice addition control unit 10 adjusts the volume (attenuation rate), the delay time, the phase, and the transfer function of the branched voice signal originating from each terminal (these are referred to as sound image control parameters). Treatment using any one of them or a desired combination. By this process each terminal
In the TM, at least one of the voices of the plurality of participants played at the same time gives a different sound image to the rest of the voices.

【００１３】図６は図５のシステムに使われるこの発明
の音声通信制御装置１００の原理的構成を示すブロック
図である。交換部１１は共通の会議に参加要求のある端
末からの通信回線を選択してＮ本の入力チャネルＣ₁ 〜
Ｃ_N を通して音声加算制御部１０に接続する。音声加算
制御部１０は接続された最大Ｎ個の端末からＮ本の入力
チャネルＣ₁ 〜Ｃ_N に入力された音声信号をそれぞれ予
め決めたＫ（Ｋは２以上の整数、図６ではK=2 であり、
左右に対応させてある）本ずつの分岐チャネルB_JL,B
_JR(J=1,…,N) 上の分岐音声信号に分岐するチャネル分
岐部１３と、それらＫ個ずつのＮ組の分岐音声信号を予
め決めた音像制御パラメータにより制御する音像制御部
１４と、Ｎ組の音像制御された音声信号のそれぞれ対応
するＮ本の分岐チャネルの音声信号を加算しＫチャネル
の加算音声信号を生成する加算部１５と、Ｋチャネルの
加算音声信号をそれぞれＮ個に分岐し、Ｎ組のＫチャネ
ル信号として交換部１１に与える端末対応分岐部１６と
から構成される。チャネル分岐部１３、音像制御部１４
及び加算部１５は音声信号処理部２５を構成している。
交換部１１によってＮ組のＫチャネル信号がそれぞれＮ
個の参加端末TM-1〜TM-Nに送出される。図６はＮ個の通
信チャネルに対して各々２チャネル(K=2) の音声信号が
２つの下り回線で交換部１１から参加端末に送信される
ように示している。FIG. 6 is a block diagram showing the basic configuration of the voice communication control device 100 of the present invention used in the system of FIG. The switching unit 11 selects a communication line from a terminal that is requested to participate in the common conference and selects N input channels C ₁ to
The voice addition control unit 10 is connected through C _N. The voice addition control unit 10 predetermines K (K is an integer of 2 or more, K = K in FIG. 6) of voice signals input to _N input channels C _{1 to} C _N from a maximum of N connected terminals. 2 and
Branch channels B _JL , B for each left and right)
A channel branching unit 13 that branches to a branching audio signal on _JR (J = 1, ..., N), and a sound image control unit 14 that controls the K sets of N branching audio signals by a predetermined sound image control parameter. , N sets of N-channel added audio signals, and an addition unit 15 that adds the audio signals of N branch channels corresponding to N sets of image-controlled audio signals to generate an added audio signal of K channels. It is composed of a terminal-corresponding branching unit 16 which is branched and is given to the switching unit 11 as N sets of K channel signals. Channel branching unit 13, sound image control unit 14
The adder 15 and the adder 15 constitute an audio signal processor 25.
The switching unit 11 converts N sets of K channel signals into N sets.
It is sent to each of the participating terminals TM-1 to TM-N. FIG. 6 shows that voice signals of 2 channels (K = 2) for N communication channels are transmitted from the switching unit 11 to the participating terminals through two downlinks.

【００１４】上述のように図６に示す原理的構成ではK=
2 の場合を示しており、チャネル分岐部１３の各分岐点
3-1,3-2,…,3-Nで入力音声信号を２チャネル信号に分岐
している。慣習に従って２つのチャネルを左右チャネル
に対応させる。ここで左チャネルに係わる部分の記号に
Ｌを、右チャネルに係わる部分の記号にＲを付加してあ
る。音像制御部１４はＮ組の信号処理部4-1L,4-1R,4-2
L,4-2R,…,4-NL,4-NRからなり、分岐された音声信号を
それぞれ決められた種類の音像制御パラメータを用いて
処理する。前述のように音像制御パラメータの種類とし
ては音量、位相、遅延時間、伝達関数等を使うことがで
きる。例えばチャネル分岐部１３の各分岐点における入
力音声信号の分岐数を２として、各種音像制御パラメー
タによる音像制御の効果について以下に簡単に説明す
る。 (a) 音量（レベル、減衰率、増幅率等のいずれでもよ
い）を音像制御パラメータとして使う場合、各入力音声
信号に対応する左右分岐音声信号に対し相対的音量（レ
ベル）を制御することにより、その音声信号に由来して
端末において２つのスピーカにより再生される音像の方
位を左右スピーカの間の所望の方位に制御することがで
きる。 (b) 位相（同位相、逆位相）を音像制御パラメータとし
て使う場合、各入力音声信号に対応する左右分岐音声信
号に対し互いに同位相又は逆位相となるように制御すれ
ば再生された音像に距離感を与えたり、距離感を無くし
たりすることができる。 (c) 遅延時間を音像制御パラメータとして使う場合、各
入力音声信号に対応する左右分岐音声信号に対し相対的
遅延量を制御することにより、再生音像の方位を前方空
間の水平面内における所望の方位に制御できる。 (d) 音響伝達関数を音像制御パラメータとして使う場
合、各入力音声信号に対応する左右分岐音声信号に対
し、目的位置に対応した一対の音響伝達関数をそれぞれ
畳み込むことにより、ステレオヘッドホンによる再生音
像を目的位置に定位させることができる。As described above, K = in the principle configuration shown in FIG.
2 shows the case of each branch point of the channel branching unit 13.
Input audio signals are branched into 2-channel signals at 3-1, 3-2, ..., 3-N. According to convention, the two channels correspond to the left and right channels. Here, L is added to the symbol relating to the left channel, and R is added to the symbol relating to the right channel. The sound image control unit 14 includes N sets of signal processing units 4-1L, 4-1R, 4-2.
It consists of L, 4-2R, ..., 4-NL, 4-NR, and processes the branched audio signals using sound image control parameters of a predetermined type. As described above, volume, phase, delay time, transfer function and the like can be used as the types of sound image control parameters. For example, assuming that the number of branches of the input audio signal at each branch point of the channel branching unit 13 is 2, the effect of sound image control by various sound image control parameters will be briefly described below. (a) When using the volume (any of level, attenuation rate, amplification rate, etc.) as a sound image control parameter, by controlling the relative volume (level) for the left and right branch audio signals corresponding to each input audio signal, The direction of the sound image reproduced by the two speakers in the terminal based on the audio signal can be controlled to a desired direction between the left and right speakers. (b) When using the phase (in-phase and anti-phase) as a sound image control parameter, if the left and right branch audio signals corresponding to each input audio signal are controlled so that they are in phase or opposite phase, the reproduced sound image It is possible to give a sense of distance or eliminate the sense of distance. (c) When the delay time is used as a sound image control parameter, the direction of the reproduced sound image is adjusted to a desired direction in the horizontal plane of the front space by controlling the relative delay amount for the left and right branch audio signals corresponding to each input audio signal. Can be controlled. (d) When the acoustic transfer function is used as a sound image control parameter, the left and right branch audio signals corresponding to each input audio signal are convolved with a pair of acoustic transfer functions corresponding to the target position to reproduce a sound image reproduced by stereo headphones. It can be localized at the target position.

【００１５】音像制御パラメータは各信号処理部4-1L,4
-1R,4-2L,4-2R,…,4-NL,4-NRに対しパラメータ設定部１
４Ｃから与えられる。例えば通信会議に参加する人数に
応じて音像制御パラメータを決定する方法が可能であ
る。図６の場合、信号処理部4-1L,4-2L,…,4-NL からの
音声信号が加算部１５の加算器５Ｌで加算され、左チャ
ネル加算音声信号が生成される。信号処理部4-1R,4-2R,
…,4-NR からの音声信号が加算部１５の加算器５Ｒで加
算され、右チャネル加算音声信号が生成される。従っ
て、端末対応分岐部１６から各参加端末TM-1〜TM-Nに分
配されるＫチャネル信号は全ての参加端末から送信され
た音声信号による成分が含まれる。The sound image control parameters are the signal processing units 4-1L, 4
Parameter setting part 1 for -1R, 4-2L, 4-2R, ..., 4-NL, 4-NR
4C. For example, a method of determining the sound image control parameter according to the number of people participating in the communication conference is possible. In the case of FIG. 6, the audio signals from the signal processing units 4-1L, 4-2L, ..., 4-NL are added by the adder 5L of the adding unit 15, and the left channel added audio signal is generated. Signal processing section 4-1R, 4-2R,
The voice signal from 4-NR is added by the adder 5R of the adder 15 to generate the right channel added voice signal. Therefore, the K channel signal distributed from the terminal-corresponding branching unit 16 to each of the participating terminals TM-1 to TM-N includes a component of the voice signal transmitted from all the participating terminals.

【００１６】各端末TM-1〜TM-Nはその１つを代表して図
７に示すようにマイクロホンＭＣ、送信部５１、復号部
５２、再生部53L,53R により構成される。受信したＫ(=
2)チャネルの符号化音声信号は復号部５２でチャネル毎
の音声信号に復号され、それぞれ再生部53L,53R により
音声に変換される。従って端末ＴＭの使用者が聴取する
音声には全参加端末から送出された音声が含まれてい
る。Each of the terminals TM-1 to TM-N is constituted by a microphone MC, a transmitting section 51, a decoding section 52, and reproducing sections 53L and 53R as shown in FIG. Received K (=
2) The coded audio signal of the channel is decoded into the audio signal of each channel by the decoding unit 52, and converted into the audio by the reproducing units 53L and 53R, respectively. Therefore, the voice heard by the user of the terminal TM includes the voice transmitted from all the participating terminals.

【００１７】この発明では音像制御部１４において、Ｎ
組の分岐音声信号に対しそれぞれ異なる音像制御パラメ
ータの選択により、各参加端末ＴＭにおいて、少なくと
も１つの参加端末からの音声に対する音像と他の参加端
末からの音声に対する音像とを区別させることが可能と
なる。制御する音像の特質として受聴者が聴覚上感知す
る音の空間位置、広がり感があげられる。例えば端末の
再生部53L,53R がそれぞれスピーカの場合は、左右音声
信号に対する音像制御パラメータとしてチャネル間レベ
ル差、チャネル間時間差、位相（同位相、逆位相）、の
いずれか１つ又はレベル差と時間差の組み合わせを制御
することにより音像を制御することができる。そこで図
６の音像制御部１４におけるＮ組の左右音声信号に対し
信号処理部4-1L,4-1R,4-2L,4-2R,…,4-NL,4-NRで与える
音像制御パラメータをそれぞれ適切に選ぶことにより、
図７の参加端末で再生される音声に含まれる各端末から
の音声成分に対して所望の音像を与えることが可能とな
る。図７の再生部53L,53Rがステレオヘッドホンの場合
はK=2 に限定される。また図６の音像制御部１４でＮ組
の左右音声信号に対しそれぞれ所望の音源の空間位置に
対応する伝達関数を音像制御パラメータとして畳み込み
演算することにより、図７の再生部53L,53Rから再生さ
れる各端末からの音声成分に所望の空間位置を定位させ
る音像を与えられる。In the present invention, in the sound image controller 14, N
By selecting different sound image control parameters for the pair of branched audio signals, it is possible to distinguish the sound image for the sound from at least one participating terminal and the sound image for the sound from another participating terminal in each participating terminal TM. Become. The characteristics of the controlled sound image are the spatial position and the sense of spaciousness of the sound that the listener perceptually perceives. For example, when the reproducing units 53L and 53R of the terminal are speakers, respectively, as the sound image control parameter for the left and right audio signals, any one of the level difference between channels, the time difference between channels, and the phase (same phase, opposite phase) or the level difference is set. The sound image can be controlled by controlling the combination of the time differences. Therefore, sound image control parameters given by the signal processing units 4-1L, 4-1R, 4-2L, 4-2R, ..., 4-NL, 4-NR for N sets of left and right audio signals in the sound image control unit 14 of FIG. By selecting each appropriately,
It is possible to give a desired sound image to the audio component from each terminal included in the audio reproduced by the participating terminals in FIG. 7. When the reproduction units 53L and 53R in FIG. 7 are stereo headphones, K is limited to 2. Further, the sound image control unit 14 of FIG. 6 performs convolution calculation of the transfer functions corresponding to the spatial positions of the desired sound sources with respect to the N sets of left and right audio signals, respectively, as the sound image control parameters, and reproduces from the reproduction units 53L and 53R of FIG. A sound image that localizes a desired spatial position is given to the audio component from each terminal.

【００１８】以下の各実施例ではK=2 の場合について示
すが端末の再生部としてスピーカを用いる場合、Ｋ≧３
も可能である。以下、この発明の具体的実施例について
図面により説明する。第１実施例図８はこの発明による図６の原理的構成に基づいた音声
通信制御装置の第１の実施例を示す。複数の端末TM-1〜
TM-Mが、通信網４０を経由してこの発明による音声通信
制御装置１００に収容されている。この実施例では複数
の参加端末に接続された交換部１１からの入力チャネル
Ｃ₁ 〜Ｃ_N 上の音声信号を監視することにより、どの入
力チャネル上の音声信号が主話者による音声信号である
かが決定される。ここで、主話者による再生音声の音像
が、他端末による音声の音像と受話者に区別できるよう
にする。In each of the following embodiments, the case of K = 2 is shown, but when a speaker is used as the reproducing unit of the terminal, K ≧ 3.
Is also possible. Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. First Embodiment FIG. 8 shows a first embodiment of a voice communication control device based on the principle configuration of FIG. 6 according to the present invention. Multiple terminals TM-1 ~
The TM-M is accommodated in the voice communication control device 100 according to the present invention via the communication network 40. In this embodiment, by monitoring the voice signals on the input channels C _{1 to} C _N from the switching unit 11 connected to the plurality of participating terminals, the voice signal on which input channel is the voice signal by the main speaker. Is decided. Here, the sound image of the reproduced voice by the main speaker can be distinguished from the sound image of the voice by the other terminal.

【００１９】この実施例の音声通信制御装置１００は交
換部１１、音声信号データと制御信号データまたは画像
データとの分離／多重化部２２、音声信号復号化部２３
Ａ、音声検出処理部２３Ｂ、話者選択部２４、音声信号
処理部２５、主話者用音声減算部２６、音声符号化部２
７、２８、下り音声信号選択部２９、信号処理制御部２
０、画像表示制御部３０を含んでいる。交換部１１，分
離／多重化部２２，音声信号復号化部２３Ａ、音声検出
処理部２３Ｂは接続された回線対応に処理を行い、最大
同時接続端末数Ｎ分の処理能力を持つ。The voice communication control apparatus 100 of this embodiment includes a switching unit 11, a separation / multiplexing unit 22 for voice signal data and control signal data or image data, and a voice signal decoding unit 23.
A, voice detection processing unit 23B, speaker selection unit 24, voice signal processing unit 25, main speaker voice subtraction unit 26, voice encoding unit 2
7, 28, downlink voice signal selection unit 29, signal processing control unit 2
0, the image display control unit 30 is included. The switching unit 11, the demultiplexing / multiplexing unit 22, the voice signal decoding unit 23A, and the voice detection processing unit 23B perform processing corresponding to the connected line, and have a processing capacity corresponding to the maximum number N of simultaneously connected terminals.

【００２０】図８は、端末TM-1〜TM-Mの一例として、画
像情報と音声情報を同時に送受信する画像会議端末を用
いる例を示す。この発明の構成にとって画像情報の存在
は必須ではなく、画像表示制御部３０はこの発明の要旨
に直接関係しないので、画像表示制御の詳細説明は省略
する。ただし、画像情報を元に各端末TM-1〜TM-Mが参加
する会議の指定、会議の構成制御を行うこともできる。
この場合には音声信号加算制御に関係する信号が画像表
示制御部３０から信号処理制御部２０に送られる。FIG. 8 shows, as an example of the terminals TM-1 to TM-M, an image conference terminal that simultaneously transmits and receives image information and audio information. The presence of image information is not essential to the configuration of the present invention, and the image display control unit 30 is not directly related to the gist of the present invention, and thus detailed description of the image display control is omitted. However, it is also possible to specify the conference in which each of the terminals TM-1 to TM-M participates and control the configuration of the conference based on the image information.
In this case, a signal related to the audio signal addition control is sent from the image display control unit 30 to the signal processing control unit 20.

【００２１】以下、Ｍ台の端末（TM-1〜TM-M）が通信網
４０を通して音声通信制御装置１００に接続されている
場合を例に装置の動作を説明する。通信網４０としては
即時双方向の音声通信が可能な回線が用いられる。例え
ばＮ−ＩＳＤＮ回線、専用線、アナログ電話回線、ＬＡ
Ｎ回線、個別回線、多重化された論理的回線の何れでも
適用できる。また有線、無線を問わない。ただし交換部
１１を通信網４０の種類に適合させる必要はある。この
実施例では、Ｎ−ＩＳＤＮ回線（画像用６４kbps程度、
音声用６４kbps程度の伝送帯域）を用いる場合の例を説
明する。The operation of the apparatus will be described below by taking as an example the case where M terminals (TM-1 to TM-M) are connected to the voice communication control apparatus 100 through the communication network 40. As the communication network 40, a line capable of immediate two-way voice communication is used. For example, N-ISDN line, leased line, analog telephone line, LA
Any of N lines, individual lines, and multiplexed logical lines can be applied. It does not matter whether it is wired or wireless. However, it is necessary to adapt the exchange unit 11 to the type of the communication network 40. In this embodiment, an N-ISDN line (for image, about 64 kbps,
An example of using a transmission band of about 64 kbps for voice will be described.

【００２２】端末TM-1〜TM-Mの一例として、Ｎ−ＩＳＤ
Ｎ回線を用いたテレビ会議端末を接続することができ
る。ただし、端末TM-1〜TM-Mは２チャネル分の音声信号
を受信する機能を持つ必要がある。端末TM-1〜TM-Mは通
信網４０を通して音声通信制御装置１００の交換部１１
に接続される。例えばＩＴＵ−ＴのＨ．２２１の規格等
に従って１チャネルに多重化されて各端末TM-1〜TM-Mか
ら送られてきた画像情報と音声信号情報、及び会議の構
成を制御するための音声又は画像制御情報は、分離／多
重化部２２で分離される。分離された画像情報及び画像
表示制御情報は画像表示制御部３０に送られる。画像表
示制御はこの発明とは直接関係しないので説明は省略す
る。As an example of the terminals TM-1 to TM-M, N-ISD
It is possible to connect a video conference terminal using N lines. However, the terminals TM-1 to TM-M need to have a function of receiving audio signals for two channels. The terminals TM-1 to TM-M are connected to the switching unit 11 of the voice communication control device 100 through the communication network 40.
Connected to. For example, ITU-T H.264. The image information and audio signal information transmitted from each terminal TM-1 to TM-M, which are multiplexed into one channel according to the standard of 221 etc., and the audio or image control information for controlling the structure of the conference are separated. / Demultiplexed by the multiplexing unit 22. The separated image information and image display control information are sent to the image display control unit 30. The image display control is not directly related to the present invention, and therefore its explanation is omitted.

【００２３】音声制御情報は、分離／多重化部２２から
信号処理制御部２０に送られる。音声制御情報として
は、会議参加／退出要求などの制御があげられる。分離
／多重化部２２で分離された音声信号は音声信号復号化
部２３Ａで復号化される。以後の処理を行えるよう、例
えばＰＣＭ符号化音声信号に変換される。簡単のため以
下の処理ではこれを単に音声信号と呼ぶことにする。The voice control information is sent from the demultiplexing / multiplexing unit 22 to the signal processing control unit 20. Examples of the voice control information include control such as conference participation / exit request. The audio signal separated by the demultiplexing / multiplexing unit 22 is decoded by the audio signal decoding unit 23A. For example, it is converted into a PCM coded audio signal so that the subsequent processing can be performed. For simplicity, in the following processing, this will be simply called an audio signal.

【００２４】音声検出処理部２３Ｂでは、例えば音声信
号のパワー検出等の方法により音声検出を行う。音声が
検出されたときには、信号処理制御部２０に検出を示す
制御信号を送出する。音声検出処理部２３Ｂにおける音
声検出方式の一例を図９に示す。入力音声信号波形（図
９Ａ）を元に、単位時間（例えば１００ms間）内のパワ
ー積分値ＩＴ（図９Ｂ）を求める。ここでパワー積分値
ＩＴがＯＮ検出閾値Ｅ _ON，ＯＦＦ検出閾値Ｅ_OFFとの大
小を比べ、発言状態か否かが判定される。In the voice detection processing section 23B, for example, a voice signal is sent.
The voice is detected by a method such as signal power detection. The voice is
When detected, the detection is indicated to the signal processing control unit 20.
Send a control signal. Sound in voice detection processing unit 23B
An example of the voice detection method is shown in FIG. Input voice signal waveform (Figure
9A) based on the power within a unit time (for example, 100 ms)
-The integrated value IT (Fig. 9B) is obtained. Where power integral value
IT is ON detection threshold E _ON, OFF detection threshold E_OFFLarge with
It is determined whether or not the state is the speaking state by comparing the small numbers.

【００２５】第１の発言識別方式では単位時間のパワー
積分値ＩＴがＯＮ検出閾値Ｅ_ONを越えたとき直ちにその
端末を発言状態と判定する。パワー積分値ＩＴがＯＦＦ
検出閾値Ｅ_OFFを下回ったときには直ちに非発言状態と
判定する。従って、図９Ｃに示す斜線を付して示した区
間（ａ−ｂ，ｃ−ｄ，ｆ−ｇ）が発言状態と判定され
る。この第１の発言識別方式では、発言状態と非発言状
態との状態変更が頻繁に行われる。In the first utterance identification method, the terminal is immediately determined to be in the utterance state when the power integrated value IT per unit time exceeds the ON detection threshold E _ON . Power integrated value IT is OFF
When it falls below the detection threshold E _OFF , it is immediately determined to be in the non-speech state. Therefore, the sections (ab, cd, fg) shown by hatching in FIG. 9C are determined to be the speaking state. In this first utterance identification method, the state is changed frequently between the utterance state and the non-utterance state.

【００２６】第２の発言識別方式は、第１の発言識別方
式に比べ、単位時間のパワー積分値ＩＴがＯＦＦ検出閾
値Ｅ_OFFを下回った後、一定時間（図９Ｄに示すＴ）は
発言状態が継続しているとみなして発言状態識別が行わ
れる点が異なる。この方法によれば、図９Ｄに斜線を付
して示した区間（ａ−ｅ，ｆ−ｈ）が発言状態と識別さ
れる。Compared to the first utterance identification method, the second utterance identification method is in the utterance state for a fixed time (T shown in FIG. 9D) after the power integrated value IT of the unit time falls below the OFF detection threshold E _OFF . The difference is that the speech state identification is performed assuming that the speech state is continuing. According to this method, the section (a-e, f-h) indicated by hatching in FIG. 9D is identified as the speaking state.

【００２７】音声検出処理部２３Ｂにおいて発言状態と
識別された端末からの音声信号が、図８の話者選択部２
４で選択される。選択された音声信号は選択音声チャネ
ルＡ ₁〜Ａ_Nのいずれかを介して、音声信号処理部２５
に与えられる。音声信号処理部２５はこの発明の基本構
成であるチャネル分岐部１３、音像制御部１４、加算部
１５を含んでいる。In the voice detection processing section 23B
The voice signal from the identified terminal is the speaker selection unit 2 in FIG.
Selected in 4. The selected audio signal is the selected audio channel.
Le A ₁~ A_NAudio signal processing unit 25
Given to. The audio signal processing unit 25 is the basic structure of the present invention.
Channel branching unit 13, sound image control unit 14, adding unit
Includes 15.

【００２８】信号処理制御部２０は、分離／多重化部２
２，音声検出処理部２３Ｂから受信される制御信号、ま
たは画像表示制御部３０から受信される会議制御信号等
に基づいて動作する。ここで現在の発言者数、発言要求
者数、及び議長など常時発言権を与える必要のある端末
の状態などから判別して加算対象の端末に対応する選択
音声チャネルとその優先度を決定する。話者選択部２４
は加算対象となる選択音声チャネルＡ₁〜Ａ_Nを優先順
序に従った入力チャネル位置に接続する。この実施例で
は、主たる話者に対する音声信号を選択音声チャネルＡ
₁に通過させ、第２〜第Ｎの話者に対する音声信号を選
択音声チャネルＡ₂,Ａ₃,…,Ａ_Nに通過させるものとす
る。The signal processing control unit 20 includes a demultiplexing / multiplexing unit 2
2. It operates based on a control signal received from the voice detection processing unit 23B, a conference control signal received from the image display control unit 30, or the like. Here, the selected voice channel corresponding to the terminal to be added and its priority are determined by discriminating from the current number of speakers, the number of speakers requesting, and the state of the terminal such as the chairperson who needs to always give the right to speak. Speaker selection unit 24
Connects the selected voice channels A _{1 to} A _N to be added to the input channel positions according to the priority order. In this embodiment, a voice signal for the main speaker is selected. Voice channel A
₁ , and the voice signals for the second to Nth speakers are passed to the selected voice channels A ₂ , A ₃ , ..., A _N.

【００２９】信号処理制御部２０は音声信号処理部２５
の動作を制御する。主話者音声の了解度を損なうことな
く複数話者からの音声信号を加算分配する音声信号処理
を行う。この例では主話者の音声信号を左チャネル音声
信号に、その他の話者の音声信号を全て右チャネル音声
信号を通して音声符号化部２８に送出する。音声符号化
部２８ではステレオ符号器を用いて音声信号処理部２５
からの２チャネルステレオ加算音声信号の符号化及び多
重化を行う。下り音声信号選択部２９は図６における端
末対応分岐部１６に相当する。ここで主たる話者以外の
端末に対応する回線に対して、音声符号化部２８からの
符号化されたステレオ加算音声信号が選択される。主話
者の端末に対応する回線に対しては、音声符号化部２７
により符号化されたステレオ音声信号が選択される。た
だし、エコー消去のため主話者用音声減算部２６により
自分の音声信号を左右両チャネル加算音声信号から減算
してから符号化される。選択された音声信号はそれぞれ
分離／多重化部２２に送出される。The signal processing control section 20 is an audio signal processing section 25.
Control the behavior of. Audio signal processing is performed to add and distribute audio signals from a plurality of speakers without impairing the intelligibility of the main speaker's voice. In this example, the voice signal of the main speaker is sent to the left channel voice signal and the voice signals of the other speakers are all sent to the voice encoding unit 28 through the right channel voice signal. The voice encoding unit 28 uses a stereo encoder to output the voice signal processing unit 25.
The two-channel stereo addition audio signal from is encoded and multiplexed. The downlink voice signal selection unit 29 corresponds to the terminal corresponding branch unit 16 in FIG. Here, the encoded stereo addition voice signal from the voice encoding unit 28 is selected for the line corresponding to the terminal other than the main speaker. For the line corresponding to the terminal of the main speaker, the voice encoding unit 27
The stereo audio signal encoded by is selected. However, in order to cancel the echo, the main speaker voice subtraction unit 26 subtracts its own voice signal from the left and right channel added voice signals and then encodes it. The selected audio signals are sent to the demultiplexing / multiplexing unit 22, respectively.

【００３０】分離／多重化部２２では下り音声信号選択
部２９からのステレオ音声信号と画像表示制御部３０か
らの画像情報とを多重化する。多重化された信号は交換
部１１，通信網４０経由で各端末TM-1〜TM-Mに送出され
る。この発明で重要な役割を持つ、人間の聴覚特性と会
議時の習慣に基づいた加算処理及び音像制御処理は音声
信号処理部２５で行われる。音声信号処理部２５は前述
のように図６における信号チャネル分岐部１３、音像制
御部１４、加算部１５を有している。この例では音像制
御パラメータとしてチャネル間音声信号レベル差を制御
する減衰量と、左右チャネル間の音声信号が同位相か逆
位相を制御する位相を使って、各選択音声チャネルの分
岐音声信号を制御する。The demultiplexing / multiplexing unit 22 multiplexes the stereo audio signal from the downlink audio signal selecting unit 29 and the image information from the image display control unit 30. The multiplexed signal is sent to each terminal TM-1 to TM-M via the switching unit 11 and the communication network 40. The audio signal processing unit 25 performs addition processing and sound image control processing based on human auditory characteristics and habits at the time of conference, which play an important role in the present invention. The audio signal processing unit 25 has the signal channel branching unit 13, the sound image control unit 14, and the adding unit 15 in FIG. 6 as described above. In this example, the branch audio signal of each selected audio channel is controlled using the attenuation amount that controls the audio signal level difference between channels as the sound image control parameter and the phase that controls the in-phase or anti-phase of the audio signal between the left and right channels. To do.

【００３１】図１０は音声信号処理部２５の具体的な一
実施例を示し、この例では音像制御パラメータとしてチ
ャネル間の位相を使って分岐音声信号を制御する場合で
ある。レベル制御部１４Ａと位相制御部１４Ｂは音像制
御部１４を構成している。選択音声チャネルＡ₁ 〜Ａ_N
の音声信号はレベル制御部１４Ａの減衰器4-1,4-2,…,4
-Nによりそれぞれ2^-1/2,N^-1/2,…,N^-1/2 倍のレベルに
減衰される。減衰器4-1〜4-N から出力された音声信号
はチャネル分岐部１３の分岐点3-1,3-2,…,3-Nでそれぞ
れ左右分岐チャネルB_1L,B_1R,…,B_NL,B_NR上の左右チャネ
ル信号に分岐され、位相制御部１４Ｂでそれぞれ位相制
御器4-1L,4-1R,4-2L,4-2R,…,4-NL,4-NRにより互いに同
位相か逆位相にそれぞれ制御される。これらの減衰量及
び位相などの音像パラメータの設定は信号処理制御部２
０からの制御によりパラメータ設定部１４Ｃによって行
われる。FIG. 10 shows a concrete example of the audio signal processing unit 25, and in this example, a branched audio signal is controlled by using a phase between channels as a sound image control parameter. The level control unit 14A and the phase control unit 14B form a sound image control unit 14. Selected audio channels A _{1 to} A _N
Sound signal is attenuators 4-1, 4-2, ..., 4 of the level control unit 14A.
-N attenuates the level to 2 ^-1/2 , N ^-1/2 , ..., N ^-1/2 times, respectively. The audio signals output from the attenuators 4-1 to 4-N are divided into left and right branch channels B _1L , B _1R , ..., B at branch points 3-1, 3-2, ..., 3-N of the channel branch unit 13, respectively. The left and right channel signals on the _NL and B _NR are branched, and the phase controller 14B uses the phase controllers 4-1L, 4-1R, 4-2L, 4-2R, ... It is controlled to each phase or antiphase. The signal processing control unit 2 sets the sound image parameters such as the attenuation amount and the phase.
The parameter setting unit 14C performs the control from 0.

【００３２】主話者の音声信号、すなわち選択音声チャ
ネルＡ₁の信号は減衰器4-1 により、例えば2^-1/2 に減
衰され、分岐点3-1 で左チャネルと右チャネルに分岐さ
れ、位相制御器4-1L,4-1R で左右チャネル間で同位相の
まま加算器５Ｌ，５Ｒに与えられる。加算器５Ｌ，５Ｒ
の出力側の左チャネル音声信号、右チャネル音声信号
は、受信端末における左チャネル音声信号、右チャネル
音声信号に対応する。そのため受信端末の受聴者がステ
レオ再生で聴いていたときには、選択音声チャネルＡ₁
（主話者）の音声は中央に音像位置が定位されて距離感
を持って聴取される。The voice signal of the main speaker, that is, the signal of the selected voice channel A ₁ is attenuated by the attenuator 4-1, for example to 2 ^-1/2 , and is branched into the left channel and the right channel at the branch point 3-1. The phase controllers 4-1L and 4-1R are provided to the adders 5L and 5R in the same phase between the left and right channels. Adder 5L, 5R
The left-channel audio signal and the right-channel audio signal on the output side of are corresponding to the left-channel audio signal and the right-channel audio signal at the receiving terminal. Therefore, when the listener of the receiving terminal is listening in stereo reproduction, the selected audio channel A ₁
The (main speaker) 's voice is heard with a sense of distance, with the sound image position being localized in the center.

【００３３】選択音声チャネルＡ₂〜Ａ_Nの話者の各端
末における再生音声パワーレベルの和が、主話者の再生
音声と同等かそれより小さくなるように、選択音声チャ
ネルＡ₂〜Ａ_Nの音声信号は減衰器4-2〜4-Nにより、例
えばN^-1/2 倍（Ｎは選択チャネルＡ₁〜Ａ_Nの数）に減
衰させた上で分岐点3-2〜3-Nで左右チャネルに分岐さ
れ、右チャネル音声信号は位相制御器4-2L〜4-NLで同位
相のまま加算器５Ｌに与えられ、左チャネル音声信号
は、位相制御器4-2R〜4-NRにより位相反転（−１を乗
算）された上で加算器５Ｒに与えられる。[0033] As the sum of the reproduced sound power level in each terminal of the speaker selection voice channels A ₂ to A _N is smaller than the main speakers equal to or reproduced audio, select audio channels A ₂ to A _N Of the audio signal is attenuated by, for example, N ^-1/2 times (N is the number of selected channels A _{1 to} A _N ) by attenuators 4-2 to 4-N and then branched to branch points 3-2 to 3-N. , The right channel audio signal is given to the adder 5L in the same phase by the phase controllers 4-2L to 4-NL, and the left channel audio signal is phase controller 4-2R to 4-NR. It is phase-inverted (multiplied by -1) and then given to the adder 5R.

【００３４】ステレオ再生において左右チャネルから互
いに逆位相の音声を提示されたときには、受聴者は距離
感を失った音像を知覚する。この聴覚特性を利用するこ
とにより、前記の選択音声チャネルＡ₂〜Ａ_Nに入る従
たる音声信号は、受信端末の受聴者がステレオ再生で聴
取したときには、受聴者の周囲に距離感を失って知覚さ
れる。一方、主話者の音声は一定方向に定位されて知覚
される。図１０の音声信号処理部２５のレベル制御部１
４Ａにおける減衰器4-1〜4-Nは、端末において再生され
た主話者の音声レベルがその他の話者の音声レベルの和
より大となるようにするためのものであり、主話者の再
生音声の音像とその他の話者の再生音声の音像の定位の
差異は位相制御部１４Ｂにより、もっぱら左右チャネル
を同位相とするか逆位相とするかによって与えられてい
る。In the stereo reproduction, when sounds of opposite phases are presented from the left and right channels, the listener perceives a sound image without a sense of distance. By utilizing this auditory characteristics, minor speech signal entering said selected voice channel A ₂ to A _N, when the listener of the receiving terminal is listening in stereo reproduction, lost sense of distance around the listener Is perceived. On the other hand, the voice of the main speaker is localized and perceived in a certain direction. Level control unit 1 of audio signal processing unit 25 of FIG.
The attenuators 4-1 to 4-N in 4A are for making the voice level of the main speaker reproduced in the terminal higher than the sum of the voice levels of other speakers. The difference in the localization between the sound image of the reproduced voice and the sound image of the reproduced voice of the other speaker is given by the phase control unit 14B exclusively depending on whether the left and right channels have the same phase or opposite phases.

【００３５】図１１は音声信号処理部２５の別の実施例
を示したものであり、端末において例えば左側スピーカ
から主話者の音声のみが再生され、右側のスピーカから
全話者の混合音声がそのパワーレベルが主話者音声パワ
ーレベルと同等かそれより小さく再生されるようにした
場合である。チャネル分岐部１３により分岐されたそれ
ぞれの右分岐チャネルB_1R〜B_NRに減衰率N^-1/2 の減衰器
4-1R,4-2R,…,4-NR が挿入され主話者の左分岐チャネル
B_1L の減衰器4-1Lの減衰量は０とし、他の左分岐チャネ
ルB_2L〜B_NLの減衰器4-2L,…,4-NLには右分岐チャネルの
減衰率N^-1/2 より充分大きい、例えば無限大の減衰量が
設定されている（即ちチャネルは遮断されている）。従
って左チャネル加算器５Ｌには主話者の選択音声チャネ
ルＡ₁ の音声信号のみが減衰されずに与えられ、右チャ
ネル加算機５Ｒには全選択音声チャネルＡ₁〜Ａ_Nの信
号が減衰器4-1R〜4-NRで適切な音量、例えばN^-1/2 に減
衰されて与えられる。FIG. 11 shows another embodiment of the audio signal processing unit 25. In the terminal, for example, only the main speaker's voice is reproduced from the left speaker, and the mixed voice of all speakers is reproduced from the right speaker. This is a case where the power level is reproduced to be equal to or lower than the power level of the main speaker voice. An attenuator having an attenuation rate N ^-1/2 for each of the right branch channels B _{1R to} B _NR branched by the channel branch unit 13.
4-1R, 4-2R,…, 4-NR is inserted and left branch channel of the main speaker
The attenuation amount of the attenuator 4-1L of B _1L is set to 0, and the attenuation factors N ^{-1/2 of the} right branch channel are used for the attenuators 4-2L, ..., 4-NL of the other left branch channels B _{2L to} B _NL. A much larger, eg infinite, attenuation is set (ie the channel is blocked). Therefore, the left channel adder 5L receives only the voice signal of the main speaker's selected voice channel A ₁ without being attenuated, and the right channel adder 5R receives the signals of all selected voice channels A _{1 to} A _N. It is given after being attenuated to a proper volume, for example, N ^-1/2 at 4-1R to 4-NR.

【００３６】受信端末の受聴者は、これら音像制御処理
された主話者の音声と従たる話者の音声を同時に受聴す
ることになるが、知覚される音像位置が異なる。そのた
め、受聴者は常に主話者の発言を明瞭に聞き取ることが
できると同時に、その他の話者による各地点の発言も同
時に把握することができる多地点通信会議を実現するこ
とができる。The listener of the receiving terminal simultaneously listens to the voice of the main speaker and the voice of the subordinate who have been subjected to the sound image control processing, but the perceived sound image position is different. Therefore, the listener can always hear the speech of the main speaker clearly, and at the same time, it is possible to realize a multipoint communication conference in which the speech of each point by other speakers can be simultaneously grasped.

【００３７】信号処理制御部２０における主たる話者の
決定方式と、図１１の音声信号処理部２５を使って左右
チャネルの加算音声信号を生成しする方式の一例を図１
２に示す。図１２においては、主話者の決定方式とし
て、“ある時点で発言状態と認識された端末のうち、最
初に発言状態と認識された端末を、その発言状態が継続
する間、主話者と決定する”方式を示している。発言状
態の端末が１つのみの場合はその端末を主話者と決定す
る。複数端末が同時に発言状態となったときは、主話者
が非発言状態となった時点で、最も早く発言状態となっ
た端末について新たな主話者と決定する。図１２の行Ａ
〜Ｄはそれぞれ端末TM-1からTM-4における話者の発言期
間ＮＡ，ＮＢ，ＮＣ，ＮＤを斜線領域で示している。行
Ｅは左チャネルの音声の内容、行Ｆは右チャネルの音声
の内容を示す。右チャネルの音声は、主発言者以外の他
の参加者の音声の加算の様子を示す。An example of the method of determining the main speaker in the signal processing control section 20 and the method of generating the added audio signals of the left and right channels by using the audio signal processing section 25 of FIG. 11 is shown in FIG.
It is shown in FIG. In FIG. 12, as a method of determining the main speaker, “a terminal first recognized as the speaking state among terminals recognized as the speaking state at a certain time is regarded as the main speaker while the speaking state continues. It shows the "decision" method. When there is only one terminal in the speaking state, that terminal is determined to be the main speaker. When a plurality of terminals are in the speaking state at the same time, when the main speaker is in the non-speaking state, the terminal that is in the speaking state earliest is determined as a new main speaker. Row A of FIG.
Symbols D to D respectively indicate the speaking periods NA, NB, NC, and ND of the speakers in the terminals TM-1 to TM-4 in shaded areas. Row E shows the audio content of the left channel, and row F shows the audio content of the right channel. The sound of the right channel shows how the sounds of the participants other than the main speaker are added.

【００３８】信号処理制御部２０における主たる話者の
決定方式と、図１１の音声信号処理部２５を使って、左
右チャネルの加算音声信号を生成する方式の別の例を図
１３に示す。図１３に示す例においては、特定の端末
（図１３の例では端末TM-1）に発言優先権を与えてい
る。この制御方式は、会議における議長への優先発言権
付与、講演における講演者への優先発言権付与等に対応
している。図１３の行Ａ〜Ｄは各端末TM-1〜TM-4の発言
内容ＮＡ〜ＮＤを示す。行Ｅは左チャネルの音声信号内
容、行Ｆは右チャネルの音声信号の加算の様子を示す。FIG. 13 shows another example of the method of determining the main speaker in the signal processing control section 20 and the method of generating the added audio signals of the left and right channels by using the audio signal processing section 25 of FIG. In the example shown in FIG. 13, a speech priority is given to a specific terminal (terminal TM-1 in the example of FIG. 13). This control method corresponds to the giving of the preferential speaking right to the chairperson at the conference, the giving of the preferential speaking right to the speaker at the lecture, and the like. Rows A to D of FIG. 13 show the message contents NA to ND of the terminals TM-1 to TM-4. Row E shows the contents of the left-channel audio signal, and row F shows the addition of the right-channel audio signal.

【００３９】また、図９に示した２つの識別方式につい
ても、通信会議で行われる会議の内容により適・不適が
ある。例えば、複数の対等な端末が自由な討論を行うと
きにおいては、話者の交代が速やかに行われる識別方式
１が優れる。一方、一人一人順番に意見を述べるような
議論形態のときには、意図しない主話者交代が起こりに
くい識別方式２が優れる。Further, the two identification methods shown in FIG. 9 are also suitable / unsuitable depending on the contents of the conference held in the communication conference. For example, when a plurality of peer terminals hold free discussions, the identification method 1 that exchangs speakers quickly is superior. On the other hand, in the case of a discussion form in which each person makes an opinion in turn, the identification method 2 in which unintended main speaker change is unlikely to occur is superior.

【００４０】従って、信号処理制御部２０の中に図９，
図１２，図１３に例示したような複数の話者検出・加算
制御のアルゴリズムを持ち、会議の進行などに応じて端
末TM-1〜TM-Mからの操作などにより、話者検出・加算制
御のアルゴリズムを切り替える手段を設けることが有効
となる。話者選択部２４は、この実施例の後述する図１
０に示す音声信号処理部２５において音像制御パラメー
タを固定的に設定した場合に設けられるものであり、図
１０の音声信号処理部においてどの組の左右分岐チャネ
ルB_JL,B_JR の音声信号に対しても位相を互いに同位相又
は逆位相に選択して設定でき、またどの選択音声チャネ
ルＡ_J に対しても減衰率を１又は1/N^1/2に選択して設定
できる場合は、どの入力チャネルの音声信号が主話者の
音声信号であってもそのチャネルに対する音像制御パラ
メータの設定と、それ以外のチャネルに対する音像制御
パラメータの設定を、パラメータ設定部１４Ｃにより図
１０における主話者のチャネル（選択音声チャネルＡ
₁ ）に対するパラメータとそれ以外の選択音声チャネル
Ａ₂〜Ａ_Nに対するパラメータとの関係と同様に設定すれ
ば話者選択部２４は不要である。同様に、図１１におい
てパラメータ設定部１４Ｃにより各分岐チャネルに対す
る減衰率の設定を０、1/N^1/2、1/∞のいずれにも選択的
に設定できる場合は、話者選択部２４は不要である。Therefore, in the signal processing control unit 20, as shown in FIG.
It has a plurality of speaker detection / addition control algorithms as illustrated in FIG. 12 and FIG. 13, and the speaker detection / addition control is performed by operations from the terminals TM-1 to TM-M according to the progress of the conference. It is effective to provide a means for switching the algorithm. The speaker selection unit 24 is the same as the speaker selection unit 24 of FIG.
This is provided when the sound image control parameter is fixedly set in the audio signal processing unit 25 shown in FIG. 0, and for which set of left and right branch channels B _JL and B _JR the audio signal of the audio signal processing unit of FIG. However, if the phase can be set to the same phase or the opposite phase with each other and the attenuation rate can be selected and set to 1 or 1 / N ^1/2 for any selected audio channel A _J , which input Even if the voice signal of the channel is the voice signal of the main speaker, the parameter setting unit 14C sets the sound image control parameters for the channel and the sound image control parameters for the other channels by using the channel of the main speaker in FIG. (Selected audio channel A
_The speaker selecting unit 24 is not necessary if the setting is made in the same manner as the relationship between the parameter for ₁ ) and the parameters for the other selected voice channels A _{2 to} A _N. Similarly, in FIG. 11, if the parameter setting unit 14C can selectively set the attenuation rate for each branch channel to 0, 1 / N ^1/2 , or 1 / ∞, the speaker selection unit 24 It is unnecessary.

【００４１】また、図１０及び１１では主話者の音声信
号による音像が他の話者の音声信号による音像と区別で
きるように、左右分岐チャネル間における音声信号の位
相を制御する場合及び左右チャネルへの音声信号の分配
を制御する場合を示した。これらの場合は、各端末にお
ける再生部５３Ｌ、５３Ｒとして左右に配置されたスピ
ーカを使う。一方、図１０、１１の信号処理部4-1L、4-
NL、4-1R、4-NRにおける音像制御パラメータとして位相
又は減衰量の代わりに後述の図１８で示す左右音響伝達
関数を使ってもよい。その場合は各端末において再生部
５３Ｌ，５３Ｒとしてステレオヘッドホンを使用しなけ
ればならない。第２実施例図８の実施例を各端末における利用者が複数の会議に関
与可能とするよう変形した実施例を図１４に示す。図１
４に示す実施例では、Ｑ個の会議に対応してＱ個の音声
信号処理装置52-1〜25-Qが設けられ、また各端末TM-1〜
TM-N毎に会議選択部21-1〜21-Nが設けられ、これによっ
て複数の会議を開催できる。更に１端末が同時に２件以
上の会議に参加することを可能としている。Further, in FIGS. 10 and 11, when the phase of the audio signal between the left and right branch channels is controlled so that the sound image of the main speaker's voice signal can be distinguished from the sound image of the other speaker's voice signal, the left and right channels are controlled. The case where the distribution of the audio signal to the controller is controlled is shown. In these cases, speakers arranged on the left and right are used as the reproduction units 53L and 53R in each terminal. On the other hand, the signal processing units 4-1L and 4- in FIGS.
Instead of the phase or the attenuation amount as a sound image control parameter in NL, 4-1R, and 4-NR, a left-right acoustic transfer function shown in FIG. 18 described later may be used. In that case, stereo headphones must be used as the reproduction units 53L and 53R in each terminal. Second Embodiment FIG. 14 shows an embodiment modified from the embodiment of FIG. 8 so that the user at each terminal can participate in a plurality of conferences. FIG.
In the embodiment shown in FIG. 4, Q voice signal processing devices 52-1 to 25-Q are provided corresponding to the Q conferences, and each terminal TM-1 to TM-1.
A meeting selection unit 21-1 to 21-N is provided for each TM-N, so that a plurality of meetings can be held. Furthermore, one terminal can simultaneously participate in two or more conferences.

【００４２】会議の参加者は、音声通信制御装置１００
の話者選択部２４に対して、自分の参加する会議を１つ
または複数指定する。ただし、複数の会議を指定すると
きには、自分の音声を加算する（主たる）会議を１件指
定する。その端末による音声信号を加算対象として処理
する会議が唯一に決定される。他の会議については、該
端末による音声信号は加算対象とならず、該参加者は会
議の加算音声を傍聴するだけとなる。The participants of the conference are the voice communication control devices 100.
One or a plurality of conferences in which the user participates are designated for the speaker selection unit 24 of. However, when a plurality of conferences are designated, one conference (main) to which the user's voice is added is designated. The conference that processes the audio signal from the terminal as an addition target is uniquely determined. For other conferences, the audio signal from the terminal is not an addition target, and the participant only listens to the added voice of the conference.

【００４３】論理的な会議の１つとしては、例えば会議
参加者のうち特定２者もしくは３者以上間の対話があ
る。この場合、図１４の１端末は会議を受聴しながら同
時に特定相手との対話を行うことができる。その結果、
全参加者が物理的に同一の会議室に参加しているときと
同様な自然な会話を行うことが可能となる。図１４の実
施例においては、話者選択部２４の内部が論理的に複数
の会議（会議１〜会議Ｑ）対応に分かれ、各会議室毎に
先の図９の説明において述べたと同様の話者検出を行
う。As one of the logical conferences, for example, there is a dialogue between two or more specific participants of the conference. In this case, one terminal in FIG. 14 can simultaneously talk with a specific party while listening to the conference. as a result,
It becomes possible to have the same natural conversation as when all participants physically participate in the same conference room. In the embodiment of FIG. 14, the inside of the speaker selection unit 24 is logically divided into a plurality of conferences (conference 1 to conference Q), and the same story as described in the description of FIG. 9 is given for each conference room. Person detection.

【００４４】図１４の実施例における図８との構成上の
違いは、開催可能な会議数Ｑに足りる数の音声信号処理
部25-1〜25-Qを設けた点にあり、論理的な１つの会議に
音声信号処理部25-1〜25-Qの中の１つを割り当て、その
後段のそれぞれに会議選択部21-1〜21-Nを設ける。この
実施例における各音声信号処理部25-1〜25-Qの実施例と
しては、例えば図１０，または図１１に示した音声信号
処理部を用いることが可能である。The configurational difference of the embodiment of FIG. 14 from that of FIG. 8 is that a sufficient number of audio signal processing units 25-1 to 25-Q are provided for the number of meetings Q that can be held, which is logical. One of the audio signal processing units 25-1 to 25-Q is assigned to one conference, and conference selecting units 21-1 to 21-N are provided at the subsequent stages, respectively. As an embodiment of each audio signal processing unit 25-1 to 25-Q in this embodiment, for example, the audio signal processing unit shown in FIG. 10 or FIG. 11 can be used.

【００４５】図１５は、図１４の第２実施例における会
議選択部21-1〜21-Nの中の端末TM-Jに対応した会議選択
部21-Jの構成例を示したものである。会議選択部21-Jは
Ｑ個の信号処理制御部25-1〜25-Qのそれぞれの左右チャ
ネル音声信号が与えられる会議選択スイッチ7S-1〜7S-Q
と、それらの全ての会議選択スイッチ7S-1〜7S-Qの左チ
ャネル出力に接続された左チャネル用加算器2-L と、会
議選択スイッチ7S-1〜7S-Qの右チャネル出力に接続され
た右チャネル用加算器2-R とから構成される。信号処理
制御部２０は端末TM-Jから受信した参加会議を指定する
制御信号に基づき、指定された会議に対応する１つ又は
複数の会議選択スイッチ7S-P(1≦P≦Q)を導通させ、指
定された会議が選択される。FIG. 15 shows a configuration example of the conference selection unit 21-J corresponding to the terminal TM-J in the conference selection units 21-1 to 21-N in the second embodiment of FIG. . The conference selection unit 21-J is a conference selection switch 7S-1 to 7S-Q to which respective left and right channel audio signals of the Q signal processing control units 25-1 to 25-Q are given.
And all of those conference selection switches 7S-1 to 7S-Q are connected to the left channel adder 2-L and the conference selection switches 7S-1 to 7S-Q are connected to the right channel output. And the added right channel adder 2-R. The signal processing control unit 20 turns on one or more conference selection switches 7S-P (1 ≦ P ≦ Q) corresponding to the designated conference based on the control signal that designates the participating conference received from the terminal TM-J. Then, the designated conference is selected.

【００４６】会議１〜会議Ｑに対応した音声信号処理装
置25-1〜25-Qからの左右音声信号出力は端末対応分岐部
１６により分岐されて各端末対応の会議選択部21-Jの中
の会議選択スイッチ7S-1〜7S-Qに送られる。それによっ
て各端末TM-Jにより指定された１つまたは複数の会議の
左右音声信号が選択され、左及び右チャネル用加算器2-
L,2-R に与えられる。例えば同時に２件の会議に参加し
ているときは、２件の参加会議のそれぞれ左チャネル音
声信号は加算器2-L で加算された上で左チャネル音声信
号として出力され、２件の参加会議の右チャネル音声信
号は加算器2-Rで加算され右チャネル音声信号として出
力される。左右チャネル音声信号は図１４の対応する音
声符号化部27-Jで符号化され、対応する端末TM-Jに送信
されるので、端末TM-Jの再生部では２件の会議の混合音
声が再生される。Left and right audio signal outputs from the audio signal processing devices 25-1 to 25-Q corresponding to the conferences 1 to Q are branched by the terminal corresponding branching unit 16 to be included in the conference selecting unit 21-J corresponding to each terminal. Sent to the conference selection switches 7S-1 to 7S-Q. Thereby, the left and right audio signals of one or more conferences designated by each terminal TM-J are selected, and the left and right channel adders 2-
Given to L, 2-R. For example, when participating in two conferences at the same time, the left channel audio signals of the two participating conferences are added by the adder 2-L and output as the left channel audio signal. The right channel audio signal of is added by the adder 2-R and output as a right channel audio signal. Since the left and right channel audio signals are encoded by the corresponding audio encoding unit 27-J of FIG. 14 and transmitted to the corresponding terminal TM-J, the mixed audio of the two conferences is mixed in the reproduction unit of the terminal TM-J. Is played.

【００４７】会議の選択部分の構成として、図１５の会
議選択スイッチ7S-1〜7S-Qで前記端末対応の参加会議選
択を行う代わりに、図１４における端末対応分岐部１６
を論理的に入出力が2Q×(2Q・N) のスイッチマトリック
スで構成し、その接点のオン、オフ制御を端末からの会
議選択指示に基づいて信号処理制御部２０により行う構
成とし、会議選択部21-1〜21-Nには該端末で選択された
会議の音声信号のみを与える構成も可能である。第３実施例前述の図８、１１による第１実施例では会議中の主話者
を抽出し、この主話者の音声信号を左チャネルに割り当
て、右チャネルにそれ以外の参加者からの音声信号を加
算して割り当てた場合を示した。左及び右チャネルから
の音声信号は各地点の参加端末に伝送され、各地点にお
いて各チャネル毎に１個の音源を用いて音声を再生す
る。このシステムを３地点以上の端末間における通信に
適用すると、右チャネルにおいて、同時に２地点からの
音声信号が加算されることになる。この場合、受話者が
その２地点からの音声に対する音像位置を各地点毎に分
離して聴取できないという欠点が生じる。また、主話者
が入れ替わっても各端末からの音声信号が常に決まった
チャネルに分配されるとは限らない。従って、受話者は
各話者の音像位置を常に一定に聴取しないという欠点が
生じる。この結果、各発話者の同定及び発言内容の了解
度向上が限定される。この点を改善した実施例を図１６
に示す。As the configuration of the conference selection part, instead of performing the conference selection corresponding to the terminal with the conference selection switches 7S-1 to 7S-Q shown in FIG. 15, the terminal correspondence branching unit 16 shown in FIG.
Is configured by a switch matrix whose input / output is logically 2Q × (2Q · N), and ON / OFF control of its contacts is performed by the signal processing control unit 20 based on the conference selection instruction from the terminal. It is also possible to provide only the audio signals of the conference selected by the terminal to the units 21-1 to 21-N. Third Embodiment In the first embodiment shown in FIGS. 8 and 11, the main speaker in a conference is extracted, the voice signal of this main speaker is assigned to the left channel, and the voices from other participants are assigned to the right channel. The case where signals are added and assigned is shown. The audio signals from the left and right channels are transmitted to the participating terminals at each point, and the sound is reproduced at each point by using one sound source for each channel. When this system is applied to communication between terminals at three or more points, voice signals from two points are simultaneously added in the right channel. In this case, there is a drawback that the listener cannot hear the sound image positions of the sounds from the two points separately for each point. Further, even if the main speaker is switched, the voice signal from each terminal is not always distributed to a fixed channel. Therefore, there is a drawback that the listener does not always hear the sound image position of each speaker in a constant manner. As a result, the identification of each speaker and the improvement of the intelligibility of the utterance content are limited. An embodiment in which this point is improved is shown in FIG.
Shown in

【００４８】図１６に示す実施例も図８に示した本発明
による原理的構成に基づいている。図１６の実施例によ
る音声通信制御装置１００はＮ人の参加者の再生音声が
それぞれ異なる空間位置に定位するように、Ｎ人の参加
者の音声信号に対しそれぞれ異なる組の音響伝達関数を
音像制御パラメータとして使って処理を行う。これによ
って同時に最大Ｎ地点間における通信会議がＮ人の話者
の音声を異なる位置に定位して可能な音声通信を実現す
る。ただし各端末は音声再生部53L,53R （図７）として
ステレオヘッドホンを使う必要がある。各地点の端末か
ら各１回線の音声信号が伝送され、音声通信制御装置１
００へ入力される。一方、音声通信制御装置１００より
各１回線の音声信号が出力され、各地点の端末に返送さ
れる。ただし、出力される各１回線の音声信号は、音声
通信制御装置１００の内部において合成された２チャネ
ルステレオ音声信号が１チャネルに多重化されたもので
ある。勿論、１回線に３チャネル以上の音声信号を多重
化してもよい。The embodiment shown in FIG. 16 is also based on the principle structure according to the present invention shown in FIG. The voice communication control device 100 according to the embodiment of FIG. 16 uses different sets of acoustic transfer functions for the voice signals of the N participants so that the reproduced voices of the N participants are localized at different spatial positions. Perform processing by using it as a control parameter. As a result, voice communication is possible by simultaneously locating the voices of N speakers at different positions in a communication conference between maximum N points. However, each terminal must use stereo headphones as the audio playback units 53L and 53R (Fig. 7). The voice signal of each one line is transmitted from the terminal at each point, and the voice communication control device 1
00 is input. On the other hand, a voice signal for each one line is output from the voice communication control device 100 and is returned to the terminal at each point. However, the output voice signal of each one line is a two-channel stereo voice signal synthesized in the voice communication control device 100 and multiplexed in one channel. Of course, voice signals of three or more channels may be multiplexed on one line.

【００４９】この実施例による音声通信制御装置１００
において、図６におけるチャネル分岐部１３の各チャネ
ル分岐点3-1,…,3-Nと音像制御部１４の左右信号処理部
4-1L,4-1R,4-2L,4-2R,…,4-NL,4-NRのそれぞれ端末毎に
対応する組を音像処理部8-1,8-2,…,8-Nとして示してあ
る。その１つの音像処理部8-1 を代表して図１７に示
す。音像処理部8-1 は図２で説明した原理を使ってチャ
ネル分岐点3-1 で分岐された左右音声信号に対し音響伝
達関数H_1L,H_1R をそれぞれ畳み込み器4-1L,4-1Rにより
畳み込み演算する。その畳み込みによって得られた音声
信号を左チャネル音声信号、右チャネル音声信号として
図１６の加算部１５の加算器５Ｌ，５Ｒに与える。各チ
ャネルの分岐音声信号に畳み込む伝達関数H_1L,H_1R はそ
の音声信号の再生音声を定位させようとする所望の空間
位置に対応づけて決定することができる。Voice communication control apparatus 100 according to this embodiment
6, the channel branch points 3-1, ..., 3-N of the channel branch section 13 and the left / right signal processing section of the sound image control section 14 in FIG.
A set of 4-1L, 4-1R, 4-2L, 4-2R, ..., 4-NL, 4-NR corresponding to each terminal is set as a sound image processing unit 8-1, 8-2, ..., 8-N. It is shown as. FIG. 17 shows the one sound image processing unit 8-1 as a representative. The sound image processing unit 8-1 uses the principle described in FIG. 2 to convolve the acoustic transfer functions H _1L and H _1R with respect to the left and right audio signals branched at the channel branching point 3-1 by the convolution units 4-1L and 4-1R, respectively. The convolution operation is performed by. The audio signals obtained by the convolution are given to the adders 5L and 5R of the adding unit 15 of FIG. 16 as a left channel audio signal and a right channel audio signal. The transfer functions H _1L and H _1R that are convoluted with the branched audio signal of each channel can be determined in association with the desired spatial position where the reproduced audio of the audio signal is to be localized.

【００５０】交換部１１は回線網を構成する不特定多数
の通信網４０の中から通信回線Ｊ（１≦Ｊ≦Ｍ）を選択
する。ただし、Ｍは同時に接続された端末数を示し、一
般にＭ≦Ｎである。選択された各回線は、同時に音声通
信を行う各端末につき２チャネル接続されているもので
ある。うち、１チャネルは本発明例における音声信号の
入力を媒介し、復号化部23-J (Jは1,2,…,N)に接続され
る。他方の１チャネルは音声信号の出力を媒介し、入力
チャネルＣ_J を通して多重多重・符号化部22-Jに接続さ
れる。各復号化部23-Jは接続された端末から入力された
音声信号を各々復号化する。復号化部23-Jにおいて復号
化された音声信号は増幅率設定部３５及び増幅器36-Jへ
それぞれ供給される。The switching unit 11 selects a communication line J (1≤J≤M) from an unspecified number of communication networks 40 that constitute the circuit network. However, M indicates the number of terminals connected simultaneously, and generally M ≦ N. The selected lines are connected to two channels for each terminal that simultaneously performs voice communication. Among them, one channel mediates the input of the audio signal in the example of the present invention and is connected to the decoding unit 23-J (J is 1, 2, ..., N). The other one channel mediates the output of the audio signal and is connected to the multiplexing / coding unit 22-J through the input channel C _J. Each decoding unit 23-J decodes the audio signal input from the connected terminal. The audio signal decoded by the decoding unit 23-J is supplied to the amplification factor setting unit 35 and the amplifier 36-J, respectively.

【００５１】信号処理制御部２０は各端末から交換部１
１を介して伝達された接続確認、等の制御信号を受信す
る。これらの制御信号より接続端末数Ｍを検知する。ま
た検知された接続端末数Ｍをそれぞれ増幅率設定部３５
及びパラメータ設定部１４Ｃへ伝達する。各増幅器36-J
はそれぞれ入力音声信号を増幅率Ｇ_Jで増幅する。増幅
率Ｇ_Jは増幅率設定部３５において決定される。一例と
して、増幅率Ｇ_Jは増幅器36-Jから出力された音声信号
の短時間パワー積分値がそれぞれのチャネルについて等
しくなるように定められる。The signal processing control unit 20 changes from each terminal to the switching unit 1.
1 receives a control signal such as a connection confirmation transmitted via the control unit 1. The number M of connected terminals is detected from these control signals. In addition, the detected number M of connected terminals is set to the amplification factor setting unit 35, respectively.
And to the parameter setting unit 14C. Each amplifier 36-J
Respectively amplifies the input audio signal with an amplification factor G _J. The amplification factor G _J is determined by the amplification factor setting unit 35. As an example, the amplification factor G _J is set so that the short-time power integral value of the audio signal output from the amplifier 36-J becomes equal for each channel.

【００５２】パラメータ設定部１４Ｃは、音像処理部8-
J において各地点Ｊの端末TM-Jからの音声信号に対して
再生音を各々異なる目的空間位置θ_Jに定位させる音声
信号を合成するための処理に必要な音響伝達関数Ｈ
_JL(θ_J)，Ｈ_JR(θ_J)を設定する。目的空間位置θ_Jと
音響伝達関数Ｈ_JL(θ_J)，Ｈ_JR(θ_J)とは１対１の対応
関係があるので、各入力信号に対して目的空間位置θ_J
を決定すれば各音声信号に畳み込む音響伝達関数Ｈ
_JL(θ_J)，Ｈ_JR(θ_J)を決めることができる。ここで
は、接続端末数Ｍに基づいて各端末から伝送された音声
に対する目的空間位置θ_Jを決定する。図１８にM=5 の
場合を例示するように、仮想空間位置θ_Jを水平面上で
左側方(９０°)から正面(０°)を通り、右側方(−９０
°)まで等角度間隔Δθ=180/(M-1)度で決定する。接続
端末数Ｍによって各地点Ｊの端末TM-Jに対する仮想空間
位置θ_Jは90-180(J-1)/(M-1) 度に定められる。従って
仮想空間位置間隔Δθは最大接続可能地点数Ｎを用いる
場合(M=N) 最も小さくなる。The parameter setting section 14C includes a sound image processing section 8-
At J, the acoustic transfer function H necessary for the processing for synthesizing the voice signals from the terminals TM-J at the respective points J to synthesize the voice signals for locating the reproduced sound at different target space positions θ _J , respectively.
Set _JL (θ _J ) and H _JR (θ _J ). Since there is a one-to-one correspondence between the target space position θ _J and the acoustic transfer functions H _JL (θ _J ) and H _JR (θ _J ), the target space position θ _J for each input signal.
Acoustic transfer function H convoluted with each audio signal if
_JL (θ _J ) and H _JR (θ _J ) can be determined. Here, the target spatial position θ _J for the voice transmitted from each terminal is determined based on the number M of connected terminals. As illustrated in the case of M = 5 in FIG. 18, the virtual space position θ _J passes from the left side (90 °) to the front side (0 °) on the horizontal plane and to the right side (−90 °).
Up to (°) at an equal angular interval Δθ = 180 / (M-1) degrees. The virtual space position θ _{J with} respect to the terminal TM-J at each point J is set to 90-180 (J-1) / (M-1) degrees depending on the number M of connected terminals. Therefore, the virtual space position interval Δθ becomes the smallest when the maximum number N of connectable points is used (M = N).

【００５３】音像処理部8-J は、図１７で説明したよう
に増幅器36-Jから出力された音声信号にパラメータ設定
部１４Ｃにより設定された伝達関数H_JL(θ_J)，H_JR(θ_J)
を各々畳み込み演算して、加算器５Ｌ、５Ｒにそれぞれ
左チャネル音声信号及び右チャネル音声信号として与え
る。これら左右チャネル音声信号を、もしステレオヘッ
ドホンで再生して両耳受聴すれば、受聴者は仮想空間位
置θ_Jに音像を定位することができる。音像処理部8-J
からの左右チャネル音声信号はそれぞれ遅延部D-JL,D-J
R にも供給される。The sound image processing unit 8-J transfers the transfer signals H _JL (θ _J ) and H _JR (θ _J )
Is subjected to a convolution operation and applied to the adders 5L and 5R as a left channel audio signal and a right channel audio signal, respectively. If these left and right channel audio signals are reproduced by stereo headphones and listened to by both ears, the listener can localize the sound image at the virtual space position θ _J. Sound image processing unit 8-J
Left and right channel audio signals from D-JL, DJ
Also supplied to R.

【００５４】加算器５Ｌは音像処理部8-1〜8-Nから入力
された全ての左チャネル音声信号を加算し、得られた左
チャネル加算音声信号を分岐部１６の分岐点６Ｌに与え
る。加算器５Ｒは音像処理部8-1〜8-Nから入力された全
ての右チャネル音声信号を加算し、得られた右チャネル
加算音声信号を分岐点６Ｒに与える。分岐点６Ｌは、加
算器５Ｌから入力された左チャネルの加算音声信号をＮ
個の減算器26-1L 〜26-NL へ分岐する。分岐点６Ｒは、
加算器５Ｒから入力された右チャネル加算音声信号をＮ
個の減算器26-1R〜26-NRへ分岐する。The adder 5L adds all the left channel audio signals input from the sound image processing units 8-1 to 8-N and gives the obtained left channel added audio signal to the branch point 6L of the branch unit 16. The adder 5R adds all the right channel audio signals input from the sound image processing units 8-1 to 8-N and gives the obtained right channel added audio signal to the branch point 6R. The branch point 6L receives the added audio signal of the left channel input from the adder 5L by N
Branch to the subtractors 26-1L to 26-NL. The branch point 6R is
The right channel addition audio signal input from the adder 5R is set to N
Branch to the subtractors 26-1R to 26-NR.

【００５５】一方、各遅延部D-JLは入力した左チャネル
音声信号を遅延時間τ_JLで遅延させ、減算器26-JL へ出
力する。遅延時間τ_JLは加算器５Ｌにおける音声信号に
対する処理に関わる遅延時間と分岐点６Ｌにおいて音声
信号に対する処理に関する遅延時間の和とする。従っ
て、遅延部D-JLから出力された左チャネル音声信号と分
岐点６Ｌから出力された左チャネル加算音声信号中の音
像処理部8-J から加算器５Ｌに与えられた成分が同位相
となり、減算器26-JL で互いに相殺される。これによっ
てＪ地点の端末TM-Jから受信された音声信号成分が、そ
の端末TM-Jに分岐送信される左チャネル加算音声信号か
らキャンセルされエコーを防止することができる。従っ
て、減算器26-JL から出力され端末TM-Jに返送されるべ
き音声信号は、TM-J以外の端末からの加算音声信号だけ
である。同じ理由から、遅延部D-JRは音像処理部8-J か
ら入力した右チャネル音声信号を遅延時間τ_JRで遅延さ
せ、減算器26-JR へ出力する。遅延時間τ_JRは加算器５
Ｒにおける音声信号に対する処理に関わる遅延時間と分
岐点６Ｒにおいて音声信号に対する処理に関わる遅延時
間の和とする。On the other hand, each delay unit D-JL delays the input left channel audio signal by delay time τ _JL and outputs it to the subtractor 26-JL. The delay time τ _JL is the sum of the delay time related to the processing of the audio signal in the adder 5L and the delay time related to the processing of the audio signal at the branch point 6L. Therefore, the left channel audio signal output from the delay unit D-JL and the component given to the adder 5L from the sound image processing unit 8-J in the left channel added audio signal output from the branch point 6L have the same phase, Subtractors 26-JL cancel each other out. As a result, the voice signal component received from the terminal TM-J at the point J is canceled from the left channel addition voice signal branched and transmitted to the terminal TM-J, and echo can be prevented. Therefore, the only voice signal output from the subtractor 26-JL to be returned to the terminal TM-J is the added voice signal from the terminals other than TM-J. For the same reason, the delay unit D-JR delays the right channel audio signal input from the sound image processing unit 8-J with a delay time τ _JR and outputs it to the subtractor 26-JR. Delay time τ _JR is adder 5
The sum of the delay time related to the processing of the audio signal at R and the delay time related to the processing of the audio signal at branch point 6R.

【００５６】減算器26-JL,26-JR から出力されたエコー
消去された、左右チャネル音声信号は多重多重・符号化
部22-Jに与えられ、互いに多重化され、符号化されて交
換部１１から地点Ｊの端末TM-Jに送信される。このよう
に、各多重多重・符号化部22-Jは左右２チャネルの音声
信号を１チャネルに多重化し、符号化する。その結果、
多重化された各１チャネルの音声信号は符号化後に交換
部１１により、各地点Ｊ（１≦Ｊ≦Ｍ）へ向けて１回線
で伝送される。従って、２チャネルステレオの信号の伝
送に２回線用いることによる回線間遅延時間差を回避で
き、かつ回線数を節減できる。各端末において多重化さ
れた音声信号を復号して音声を再生すれば、その端末の
受話者にとって他地点の端末からの音声を所望の仮想空
間位置θ _Jに定位できる。その結果、各受話者が他送話
者の同定が容易で会話了解性が確保される。また、各地
点において音像定位に関わる音像位置処理手段を用いる
必要がなくなり、経済的システムが実現できる。Echo output from the subtracters 26-JL and 26-JR
The erased left and right channel audio signals are multiplexed and encoded.
Section 22-J, multiplexed with each other, coded and
It is transmitted from the conversion unit 11 to the terminal TM-J at the point J. like this
In addition, each multiplex / encoding unit 22-J has two left and right channel audio signals.
The signal is multiplexed into one channel and encoded. as a result,
Each multiplexed 1-channel audio signal is exchanged after encoding
One line for each point J (1 ≤ J ≤ M) by section 11
Transmitted in. Therefore, the transmission of 2 channel stereo signals
Avoiding the delay time difference between lines due to using two lines for transmission
And the number of lines can be saved. Multiplexed at each terminal
If the decoded audio signal is reproduced and the audio is reproduced,
For the listener, the voice from the terminal at another point can be sent to the desired virtual sky.
Position θ _JCan be localized to. As a result, each listener sends another
The person can be easily identified and conversation intelligibility is secured. Also, each place
Using sound image position processing means related to sound image localization at points
Eliminates the need and realizes an economical system.

【００５７】因みに上記の利用条件と異なり、図１６の
実施例では音声通信制御装置１００から各地点Ｊ（１≦
Ｊ≦Ｍ）へ各２チャネルのステレオ音声信号の伝送のた
めに各２回線の通信回線を使用する例を想定している。
その場合、左右のチャネルの音声信号につき各１回線が
使用され、交換部１１は各地点Ｊ毎に各３回線交換する
必要がある。また、図１６中の多重多重・符号化部22-J
及び各地点における多重化及び分離がそれぞれ不要とな
るが、各端末につき１個の多重多重・符号化部22-Jの代
わりに２個の符号化部23-JL,23-JR を要する。また符号
化部23-JL,23-JR の入力側に各々減算器26-JL,26-JR を
接続し、２個の符号化部23-JL,23-JR に音声信号を入力
する必要がある。Incidentally, unlike the above usage conditions, in the embodiment shown in FIG. 16, each point J (1 ≦ 1
J ≦ M), it is assumed that two communication lines are used for transmitting stereo audio signals of two channels.
In that case, one line is used for each of the left and right channel audio signals, and the switching unit 11 needs to switch three lines for each point J. Also, the multiplexing / encoding unit 22-J in FIG.
Also, although multiplexing and demultiplexing at each point are not required, two encoding units 23-JL and 23-JR are required instead of one multiplexing / encoding unit 22-J for each terminal. Also, it is necessary to connect subtractors 26-JL, 26-JR to the input sides of the coding units 23-JL, 23-JR, respectively, and input the speech signal to the two coding units 23-JL, 23-JR. is there.

【００５８】以上説明したように図１６の実施例によれ
ば、各端末において音像定位を目的とする音声信号処理
部を有さなくとも、受話者は他地点からの音声を異なる
位置に定位して受聴可能になる。よって、各地点の受話
者にとって話者同定が容易になり、良好な会話了解度を
確保できる。同時に、通信方式を予め定める必要がなく
なる。As described above, according to the embodiment shown in FIG. 16, even if each terminal does not have a voice signal processing unit for the purpose of sound image localization, the listener can localize voices from other points to different positions. Can be heard. Therefore, it becomes easy for the listener at each point to identify the speaker, and a good conversation intelligibility can be secured. At the same time, it is not necessary to predefine the communication method.

【００５９】上述のようにステレオヘッドホン等を用い
た両耳受聴においても、受話者が各送話者による音声を
各々異なった位置に定位する任意多地点間における音声
通信を経済的に実現可能になる。また、接続端末数Ｍが
同時に接続可能な最大端末数Ｎよりも小さい場合、各送
話者の音像定位位置間隔を拡大することができる。第４実施例ところで、図１９に示すように異なる地点の端末TM-1〜
TM-6のそれぞれの間で音声通信制御装置を介して通信が
行われる場合を考える。ここで、端末組合せTM-1〜TM-3
及び端末組合せTM-3〜TM-6において、それぞれ通信会議
Ｘ及び通信会議Ｙが実現していると仮定する。このと
き、端末TM-1及びTM-2の利用者の音声を端末TM-4〜TM-6
の利用者は受聴できない。逆に端末TM-4〜TM-6の各利用
者の音声を端末TM-1及びTM-2の利用者は受聴できない。
また、端末TM-3の利用者は端末TM-1，TM-2，TM-4〜TM-6
のうちいずれの利用者の音声も受聴でき、端末TM-3の利
用者による音声は端末TM-1,TM-2,TM-4〜TM-6のいずれの
利用者も受聴できる。この方法によって、端末組み合わ
せ間における通信会議に属さない利用者に通信内容を秘
匿させたり、逆に同時進行している複数の通信会議に属
す利用者にいずれかの通信内容を把握させる等、多様な
応用が可能になる。更に、この多地点間通信方法におい
て各送話者の音声が各々異なった仮想空間位置に定位し
て受聴されれば、各送話者の同定が容易になり、発話内
容の了解度が向上するばかりでなく送受話者があたかも
同一空間内に在席する場合と同様な自然な意志疎通が期
待される。As described above, even in the case of binaural listening using stereo headphones or the like, it is possible to economically realize voice communication between arbitrary multipoints in which the listener localizes the voices of the respective speakers at different positions. Become. Further, when the number M of connected terminals is smaller than the maximum number N of terminals that can be simultaneously connected, the sound image localization position interval of each speaker can be expanded. Fourth Embodiment By the way, as shown in FIG. 19, terminals TM-1 to
Consider a case where communication is performed between the TM-6s via the voice communication control device. Here, terminal combinations TM-1 to TM-3
It is assumed that the communication conference X and the communication conference Y are realized in each of the terminal combinations TM-3 to TM-6. At this time, the voices of the users of the terminals TM-1 and TM-2 are transmitted to the terminals TM-4 to TM-6.
Users cannot listen. On the contrary, the users of the terminals TM-1 and TM-2 cannot hear the voices of the users of the terminals TM-4 to TM-6.
In addition, the users of the terminal TM-3 are the terminals TM-1, TM-2, TM-4 to TM-6.
The voice of any of the users can be listened to, and the voice of the user of the terminal TM-3 can be listened to by any of the users of the terminals TM-1, TM-2, TM-4 to TM-6. By this method, it is possible to hide the communication contents from users who do not belong to the communication conference between the terminal combinations, and conversely to let users who belong to multiple simultaneous communication conferences grasp any communication contents. Various applications become possible. Furthermore, in this multipoint-to-point communication method, if the voice of each speaker is localized and listened to in different virtual space positions, the identification of each speaker becomes easier and the intelligibility of the utterance content is improved. Not only that, the communication is expected to be as natural as if the talker were present in the same space.

【００６０】しかしながら図１４、１５の実施例で示し
た音声通信制御装置によれば、図１９において端末TM-3
は通信会議Ｘ及びＹの両方を選択して同時受話可能であ
るが、送話は通信会議Ｘ又はＹの選択した一方のみにし
か可能でない。また通信会議ＸとＹの両方で同時に受話
を行う場合、通信会議ＸとＹの受話音声が左右のスピー
カから別々に再生されるが、通信会議Ｘ又はＹによる受
話音声信号中の複数の端末からの音声を異なる位置に定
位することはできない。However, according to the voice communication control apparatus shown in the embodiments of FIGS. 14 and 15, the terminal TM-3 in FIG.
Can select both of the communication conferences X and Y to simultaneously receive, but can transmit only to the selected one of the communication conferences X and Y. Further, when both the conference calls X and Y are simultaneously received, the received voices of the conferences X and Y are reproduced separately from the left and right speakers, but from the plurality of terminals in the received voice signal of the conference X or Y. It is not possible to localize the voice of to different positions.

【００６１】このような問題を改善した第４の実施例に
よる通信制御装置の基本的構成を図２０に示す。この第
４実施例による通信制御装置１００の主な構成は交換部
１１と、各端末TM-1〜TM-6から送られてくる音声信号に
音源から両耳に至る間の伝達関数をそれぞれ畳み込んで
発話者の音源位置を定位させる音声処理を施す音像処理
部8-J(J=1,2,…,NここではN=6)と、複数の会議に対応し
て端末組み合わせの割り当てを行う端末端末組合せ割当
部１９と、加算・分岐部17-P(P=1,…,Q,ここではQ=2)
と、組合せ間組合せ間加算部１２とによって構成するこ
とができる。各加算・分岐部17-Pは左右加算器５Ｌ，５
Ｒとそれらの加算出力をそれぞれ分岐する分岐点６Ｌ，
６Ｒからなる。組合せ間組合せ間加算部１２は左右それ
ぞれＮ個の加算器2-JL,2-JR(J=1,2,…,N,ここではN=6)
から成る。なお、同種の構成要素の中から一個の要素を
特定するため、添え字J(1≦J≦N),P(1≦P≦Q) によって
区別した。また左右２チャネルいずれの音声信号を処理
する構成要素であるかを特定するためには添え字Ｌ，Ｒ
によって区別した。FIG. 20 shows the basic configuration of the communication control apparatus according to the fourth embodiment which has improved the above problems. The main configuration of the communication control device 100 according to the fourth embodiment is that the transfer function between the sound source and both ears is convoluted with the voice signal sent from the exchange unit 11 and each terminal TM-1 to TM-6. A sound image processing unit 8-J (J = 1,2, ..., N where N = 6) that performs a voice process for locating the sound source position of the speaker is assigned, and terminal combination allocation is performed for multiple conferences. Terminal terminal combination assigning section 19 to perform and adding / branching section 17-P (P = 1, ..., Q, here Q = 2)
And the inter-combination addition unit 12. Each adder / branch unit 17-P has left and right adders 5L and 5
Branch point 6L for branching R and their addition output,
It consists of 6R. Inter-combination inter-combination adder 12 includes N adders 2-JL, 2-JR (J = 1, 2, ..., N, here N = 6) for each of the left and right sides.
Consists of In addition, in order to identify one element from the same kind of constituent elements, it was distinguished by subscripts J (1 ≦ J ≦ N) and P (1 ≦ P ≦ Q). In addition, subscripts L and R are used to identify which of the left and right channels is the component that processes the audio signal.
Distinguished by.

【００６２】図２０に示した構成により、本実施例によ
る音声通信制御装置１００の動作を説明する。交換部１
１は回線網を構成する不特定多数の通信網４０の中から
通信回線J(1≦J≦M)を選択する。Ｍは同時に接続された
全端末数を示す。一般にＭ≦Ｎである。Ｎは接続可能な
最大値である。交換部１１は例えば一端末から受信した
通信開始／終了、接続端末指定、接続確認信号等の制御
信号に基づき、その通信回線Ｊを選択してこの例では入
力チャネルＣ_J を通して音像処理部8-J に接続する。各
音像処理部8-J の構成は図１７に示した構成と同じであ
り、図６における１つの分岐点3-J と、左右チャネルの
信号処理部4-JL,4-JR の組に対応している。The operation of the voice communication control device 100 according to the present embodiment will be described with the configuration shown in FIG. Exchange unit 1
Reference numeral 1 selects a communication line J (1≤J≤M) from an unspecified number of communication networks 40 constituting the circuit network. M indicates the total number of terminals connected simultaneously. Generally, M ≦ N. N is the maximum value that can be connected. The exchange unit 11 selects the communication line J based on control signals such as communication start / end, connection terminal designation, connection confirmation signal, etc. received from one terminal, and in this example, the sound image processing unit 8-through the input channel C _J. Connect to J. The configuration of each sound image processing unit 8-J is the same as that shown in FIG. 17, and corresponds to the combination of one branch point 3-J and the signal processing units 4-JL, 4-JR of the left and right channels in FIG. are doing.

【００６３】各音像処理部8-J は各端末TM-Jから送られ
てきた音声信号に伝達関数を畳み込んで各端末TM-Jの発
話者の音声を仮想空間位置に定位させる処理を施す。従
って、音像処理部8-J から出力される音声信号はステレ
オ音声信号である。それぞれの音像処理部8-J で生成さ
れたステレオ音声信号は端末組合せ割当部１９に入力さ
れ、端末組み合わせ毎に仕分けされる。図示の例では図
１９で示したように端末TM-1〜TM-3とTM-3〜TM-6をそれ
ぞれ通信会議ＸとＹに属するものとした場合を示す。Each sound image processing unit 8-J performs a process of convoluting the transfer function into the voice signal sent from each terminal TM-J and localizing the voice of the speaker of each terminal TM-J to the virtual space position. .. Therefore, the audio signal output from the sound image processing unit 8-J is a stereo audio signal. The stereo audio signals generated by the respective sound image processing units 8-J are input to the terminal combination allocation unit 19 and sorted according to the terminal combination. In the illustrated example, as shown in FIG. 19, terminals TM-1 to TM-3 and TM-3 to TM-6 belong to communication conferences X and Y, respectively.

【００６４】端末組合せ割当部１９で通信会議ＸとＹに
区分けされたステレオ音声信号は、それぞれ加算・分岐
部17-1,17-2 に与えられ、加算器５Ｌ，５Ｒにより同じ
通信に属する端末間同士で左右のチャネル別に加算され
る。それらＸの組（及びＹの組）の加算結果はそれぞれ
加算・分基部17-1内の左右分岐点６Ｌ，６Ｒによって、
それぞれ同じ組みＸの全端末TM-1〜TM-3対応する組合せ
間加算部１２内左用の加算器2-1L〜2-3L及び右用2-1R〜
2-3Rに配分される。同様にＹの組の加算結果はそれぞれ
加算・分岐部17-2内の左右分岐点６Ｌ，６Ｒによってそ
れぞれ同じ組のＹの全端末TM-3〜TM-6に対応する組合せ
間加算部１２内の左チャネル用加算器2-3L〜2-6L及び右
チャネル用加算器2-3R〜2-6Rに分配される。各対の加算
器2-JL,2-JR はその対が属する全ての通信会議の音声信
号を左右チャネル別に加算してステレオ音声信号を生成
し、そのステレオ音声信号は交換部１１を介して対応す
る各端末TM-1〜TM-6に送信される。The stereo audio signals divided into the communication conferences X and Y by the terminal combination allocating unit 19 are given to the adding / branching units 17-1 and 17-2, respectively, and the terminals belonging to the same communication are added by the adders 5L and 5R. The left and right channels are added together. The addition results of the X group (and the Y group) are respectively calculated by the left and right branch points 6L and 6R in the addition / division base unit 17-1.
The left-side adders 2-1L to 2-3L and the right-side 2-1R to inside the inter-combination addition unit 12 corresponding to all terminals TM-1 to TM-3 of the same set X, respectively.
Allocated to 2-3R. Similarly, the addition result of the Y group is stored in the inter-combination adding section 12 corresponding to all the Y terminals TM-3 to TM-6 of the same group by the left and right branch points 6L and 6R in the adding / branching section 17-2. To the left channel adders 2-3L to 2-6L and the right channel adders 2-3R to 2-6R. The adders 2-JL and 2-JR of each pair add the audio signals of all the communication conferences to which the pair belongs to the left and right channels to generate a stereo audio signal, and the stereo audio signal is supported via the exchange unit 11. Is transmitted to each of the terminals TM-1 to TM-6.

【００６５】この例では、各端末TM-1〜TM-6に伝送され
る音声信号はステレオ音声信号であり、各端末TM-1〜TM
-6の受話者は他の端末の発話者の音声を音像処理部8-J
で畳み込んだ伝達関数で決まる仮想空間位置に図２１Ａ
と２１Ｂに示すように定位して受聴することができる。
つまり、図２０の例では端末TM-1〜TM-3から送られてき
た音声信号にそれぞれ伝達関数の畳み込み演算を行い、
互いに加算し、送出するから図２１Ａに示すように端末
TM-1〜TM-3の発話者のみが受聴できる。この端末の組の
通信会議を通信会議Ｘとする。また、端末TM-3〜TM-6か
ら送られてきた音声信号に対しそれぞれ伝達関数の畳み
込み演算して各端末TM-3〜TM-6同士の音声信号を加算し
て端末TM-3〜TM-6に送出するので、端末TM-3〜TM-6の各
受話者は図２１Ｂに示すように各端末TM-3〜TM-6の発話
者の音声を音像処理部で畳み込んだ伝達関数と対応づけ
られる仮想空間位置に定位して受聴することができる。
この端末の組の通信会議を通信会議Ｙとする。ただし、
ここで端末TM-3の受聴者は通信会議ＸとＹの双方に参加
しているから、この端末TM-3の受聴者は図２１Ｃに示す
ように、通信会議ＸとＹの双方の端末TM-1〜TM-6の全て
の発話内容を各異なる仮想空間位置に定位して受聴する
ことができることになる。第５実施例図２０に示す基本的構成の音声通信制御装置１００を具
体的に構成した実施例について図２２及び２３を参照し
て説明する。図２２に示す実施例の利用条件として、複
数端末と本実施例で提案する音声通信制御装置１００間
において１端末につき往復各１回線を用いた音声信号の
伝送による通信を想定する。また、この実施例は同時に
最大Ｎ端末間において最大Ｑ件の通信会議を制御する。
ここで、各端末から各１チャネルのディジタル音声信号
が伝送され、音声通信制御装置１００に入力される。音
声通信制御装置１００より各１チャネルのディジタル音
声信号が出力され各端末へ伝送される。ただし、出力さ
れる各１チャネルの音声信号は、音声通信制御装置１０
０において生成された各２チャネルのステレオ音声信号
が１チャネルに多重化されたものである。In this example, the audio signal transmitted to each terminal TM-1 to TM-6 is a stereo audio signal, and each terminal TM-1 to TM-6
The receiver of -6 receives the voice of the speaker of the other terminal as the sound image processing unit 8-J.
21A at the virtual space position determined by the transfer function convolved with
And 21B, it can be localized and listened.
That is, in the example of FIG. 20, the convolution operation of the transfer function is performed on each of the voice signals sent from the terminals TM-1 to TM-3,
As shown in FIG. 21A, the terminals are added and transmitted.
Only TM-1 to TM-3 speakers can listen. The communication conference of this set of terminals is called communication conference X. Also, the convolution operation of the transfer function is performed on the voice signals sent from the terminals TM-3 to TM-6, and the voice signals of the terminals TM-3 to TM-6 are added to add the voice signals to the terminals TM-3 to TM. -6, the listeners of the terminals TM-3 to TM-6 transfer the transfer functions obtained by convolving the voices of the speakers of the terminals TM-3 to TM-6 in the sound image processing unit as shown in FIG. 21B. It is possible to localize and listen to the virtual space position associated with.
A communication conference of this set of terminals is referred to as a communication conference Y. However,
Here, since the listener of the terminal TM-3 participates in both the communication conferences X and Y, the listener of the terminal TM-3 receives the terminal TM of both the communication conferences X and Y as shown in FIG. 21C. -1 to TM-6 can be localized and listened to in different virtual space positions. Fifth Embodiment An embodiment in which the voice communication control device 100 having the basic configuration shown in FIG. 20 is specifically configured will be described with reference to FIGS. 22 and 23. As the usage conditions of the embodiment shown in FIG. 22, it is assumed that communication between a plurality of terminals and the voice communication control apparatus 100 proposed in this embodiment is performed by transmitting a voice signal using one round-trip line for each terminal. In addition, this embodiment simultaneously controls a maximum of Q communication conferences among a maximum of N terminals.
Here, the digital voice signal of each channel is transmitted from each terminal and input to the voice communication control device 100. The voice communication control device 100 outputs a digital voice signal for each channel and transmits it to each terminal. However, the audio signal of each one channel that is output is the audio communication control device 10
Each of the two channels of stereo audio signals generated at 0 is multiplexed into one channel.

【００６６】図２２において交換部１１、復号化部23-J
(J=1,…,N)、信号処理制御部２０、増幅率設定部３５、
増幅器36-J(J=1,…,N) 、パラメータ設定部１４Ｃ，音
像処理部8-J(J=1,…,N) ，多重・符号化部22-J(J=1,…,
N)の構成と動作は図１６の実施例における対応するもの
と同じなので説明を省略する。異なる点は，端末端末端
末組合せ割当部１９，端末選択制御部９Ｃ，加算・分岐
部17-P(P=1,…,Q)，断続組合せ決定部７Ｃ，断続部7-P
(P=1,…,Q) ，組合せ間組合せ間加算部１２が設けられ
ている点である。端末端末組合せ割当部１９はQ×N個の
端末選択部9_P-J(P=1,…,Q；J=1,…,N)を有しており、組
合せ間組合せ間加算部１２はＮ対の加算器2-JL,2-JR(J=
1,…,N)を有している。In FIG. 22, the exchange unit 11 and the decryption unit 23-J
(J = 1, ..., N), the signal processing control unit 20, the amplification factor setting unit 35,
Amplifier 36-J (J = 1, ..., N), parameter setting unit 14C, sound image processing unit 8-J (J = 1, ..., N), multiplexing / encoding unit 22-J (J = 1, ..., N)
The configuration and operation of N) are the same as the corresponding ones in the embodiment of FIG. 16, so description thereof will be omitted. The difference is that the terminal terminal terminal combination allocation unit 19, terminal selection control unit 9C, addition / branching unit 17-P (P = 1, ..., Q), intermittent combination determination unit 7C, intermittent unit 7-P
(P = 1, ..., Q), and the point that the inter-combination addition unit 12 is provided. The terminal / terminal combination assigning unit 19 has Q × N terminal selecting units 9 _P -J (P = 1, ..., Q; J = 1, ..., N), and the inter-combination combination adding unit 12 N pairs of adders 2-JL, 2-JR (J =
1, ..., N).

【００６７】各加算・分岐部17-Pは図２３に示すよう
に、加算器５Ｌ，５Ｒ，分岐点６Ｌ，６Ｒ，遅延部D-J
L，D-JR（J=1,…,N) ，減算器26-JL，26-JR（J=1,…,
N）とによって構成される。以下、その実施例において
特徴的部分の機能を説明する。図１６の実施例で説明し
たと同様に接続された端末からの音声信号が復号化部23
-J、増幅器36-Jを経て音像処理部8-J に与えられる。As shown in FIG. 23, each adder / branch unit 17-P has adders 5L and 5R, branch points 6L and 6R, and a delay unit DJ.
L, D-JR (J = 1, ..., N), Subtractor 26-JL, 26-JR (J = 1, ..., N)
N) and. The functions of the characteristic parts in the embodiment will be described below. The audio signal from the connected terminal is decoded by the decoding unit 23 in the same manner as described in the embodiment of FIG.
-J and the amplifier 36-J and is given to the sound image processing unit 8-J.

【００６８】信号処理制御部２０は各端末から交換部１
１を介して伝達された通信開始／終了、接続確認、各接
続端末の通信会議への所属、等の制御信号を受信する。
これらの制御信号より、接続端末の数Ｍ，接続端末TM-1
〜TM-n，通信開始／終了、各接続端末の組合せ間におけ
る通信への所属を検知する。また、検知された接続端末
及び接続端末の数Ｍを増幅率設定部３５及びパラメータ
設定部１４Ｃへ、通信開始／終了を端末選択制御部９Ｃ
及び組合せ選択制御部７Ｃへ、各接続端末TM-1〜TM-nの
組合せ間における通信への所属を端末選択制御部９Ｃへ
伝達する。The signal processing control unit 20 changes from each terminal to the exchange unit 1.
1 receives control signals transmitted via 1 such as communication start / end, connection confirmation, and belonging of each connected terminal to a communication conference.
From these control signals, the number of connected terminals M, the connected terminal TM-1
~ Detects TM-n, start / end of communication, and belonging to communication between combinations of connection terminals. Further, the detected connection terminal and the number M of the connection terminals are sent to the amplification factor setting unit 35 and the parameter setting unit 14C, and the communication start / end is set to the terminal selection control unit 9C.
Also, the combination selection control unit 7C is notified to the terminal selection control unit 9C of belonging to the communication between the combinations of the connection terminals TM-1 to TM-n.

【００６９】パラメータ設定部１４Ｃは、各音像処理部
8-J において各端末TM-Jからの音声信号に対して再生音
を各端末TM-Jについて各々異なる仮想空間位置θ_Jに定
位させる音声信号を生成するための処理に必要な音響伝
達関数H_L(θ_J)，H_R(θ_J)を設定する。仮想空間位置θ_J
と音響伝達関数H_L(θ_J)，H_R(θ_J)とは１対１の対応関係
があるので、仮想空間位置θ_Jを決定すればその音響伝
達関数H_L(θ_J)，H_R(θ _J)を設定可能になる。本実施例で
は接続端末の数Ｍに基づいて各端末から伝送された音声
に対する仮想空間位置θ_Jを決定する。図２１Ｃに例示
するように、仮想空間位置θ_Jを水平面上に左側方(+90
°) から正面(０°)を通り、右側方(-90°)まで等角度
間隔180/(M-1)度で決定する。つまり、各地点Ｊに対す
る仮想空間位置θ_Jは90-180(J-1)/(M-1) 度で定められ
る。The parameter setting section 14C is used for each sound image processing section.
Playback sound for audio signals from each terminal TM-J at 8-J
For each terminal TM-J_JSet to
Sound transmission required for processing to generate a voice signal
Reaching function H_L(θ_J), H_R(θ_J) Is set. Virtual space position θ_J
And acoustic transfer function H_L(θ_J), H_R(θ_J) Is a one-to-one correspondence
Therefore, the virtual space position θ_JIf you decide
Reaching function H_L(θ_J), H_R(θ _J) Can be set. In this embodiment
Is the voice transmitted from each terminal based on the number M of connected terminals
Virtual space position θ with respect to_JTo determine. Illustrated in Figure 21C
So that the virtual space position θ_JTo the left of the horizontal plane (+90
Angle from the front (0 °) to the right (-90 °)
The interval is 180 / (M-1) degrees. In other words, for each point J
Virtual space position θ_JIs defined as 90-180 (J-1) / (M-1) degrees
You.

【００７０】図２１Ｃに示す例ではＭ＝６であるから、
各端末TM-1〜TM-6の各仮想空間位置θ_Jは、 θ_J=1＝ 90°−180°×(1-1)/(6-1)＝+90° θ_J=2＝ 90°−180°×(2-1)/(6-1)＝+54° θ_J=3＝ 90°−180°×(3-1)/(6-1)＝+18° θ_J=4＝ 90°−180°×(4-1)/(6-1)＝-18° θ_J=5＝ 90°−180°×(5-1)/(6-1)＝-54° θ_J=6＝ 90°−180°×(6-1)/(6-1)＝-90° となる。In the example shown in FIG. 21C, since M = 6,
Each virtual space position θ _J of each terminal TM-1 to TM-6 is θ _{J = 1} = 90 ° −180 ° × (1-1) / (6-1) = + 90 ° θ _{J = 2} = 90 ° −180 ° × (2-1) / (6-1) ＝ + 54 ° θ _{J = 3} ＝ 90 ° −180 ° × (3-1) / (6-1) ＝ + 18 ° θ _{J = 4} = 90 ° -180 ° × (4-1) / (6-1) =-18 ° θ _{J = 5} = 90 ° −180 ° × (5-1) / (6-1) =-54 ° θ _{J = 6} = 90 ° -180 ° x (6-1) / (6-1) = -90 °.

【００７１】音像処理部8-J は、増幅器36-Jから与えら
れた音声信号にパラメータ設定部１４Ｃにおいて端末TM
-Jに従って設定された音響伝達関数H_L(θ_J)，H_R(θ_J)を
各々畳み込み演算して、左右各１チャネルのステレオ音
声信号を生成する。生成されたステレオ音声信号から音
を再生して両耳受聴するとき、受聴者は仮想空間位置θ
_Jに音像を定位する。各音像処理部8-J からの左右チャ
ネル音声信号はＱ個の端末選択部9₁-J，9₂-J，…，9_Q-J
に分配される。端末選択制御部９Ｃは、信号処理制御部
２０より伝達された各通信開始／終了及び各接続端末の
通信会議への所属に基づいて、断続情報を決定し、端末
選択部9_P-Jへ伝達する。例えば、端末TM-Jが所属してい
る通信会議Ｐが開始または終了されたとき、端末選択部
9_P-Jへ音声信号をそれぞれ通過または遮断させるべく制
御信号を転送する。端末選択部9_P-Jは、端末選択制御部
９Ｃから転送された制御信号に従って音声信号を断続す
る。それによって例えば、図２１Ｃに示される端末組み
合わせにおいて、端末TM-1〜TM-3に由来する音声信号が
加算・分岐部17-1に、端末TM-3〜TM-6に由来する音声信
号が加算・分岐部17-2に割り当てられる。The sound image processing unit 8-J applies the audio signal given from the amplifier 36-J to the terminal TM in the parameter setting unit 14C.
The convolution operation is performed on each of the acoustic transfer functions H _L (θ _J ) and H _R (θ _J ) set according to -J to generate a stereo audio signal for each of the left and right channels. When a sound is reproduced from the generated stereo audio signal to be listened to by both ears, the listener hears the virtual space position θ.
_The sound image is localized at _J. The left and right channel audio signals from the sound image processing units 8-J are Q terminal selection units 9 ₁ -J, 9 ₂ -J, ..., 9 _Q -J.
Distributed to Terminal selection control unit 9C, each communication start is transmitted from the signal processing control unit 20 / Exit and based on belonging to a communication conference for each connection terminal, to determine the intermittent information, transmitted to the terminal selection unit 9 _P -J To do. For example, when the communication conference P to which the terminal TM-J belongs is started or ended, the terminal selection unit
9 Transfer control signals to _P- J to pass or block audio signals respectively. Terminal selection unit 9 _P -J is intermittently audio signal according to the control signal transferred from the terminal selection control unit 9C. Thereby, for example, in the terminal combination shown in FIG. 21C, the voice signals originating from the terminals TM-1 to TM-3 are added to the adding / branching unit 17-1 and the voice signals originating from the terminals TM-3 to TM-6. It is assigned to the adding / branching unit 17-2.

【００７２】加算・分岐部17-Pの内部構成について図２
３を用いて説明する。各端末選択部9_P-Jから入力された
左右チャネル音声信号はそれぞれ加算器５Ｌ及び５Ｒに
与えれると共に、それぞれ遅延部D-JL及びD-JRに与えら
れる。加算器５Ｌは入力されたＮ個の左チャネル音声信
号を加算し、分岐点６Ｌへ出力する。加算器５Ｒは入力
されたＮ個の右チャネル音声信号を加算し、分岐点６Ｒ
へ出力する。分岐点６Ｌは加算器５Ｌから入力された右
チャネル加算音声信号をＮ個の減算器26-JL(J=1,…,N)
へ分岐する。分岐点６Ｒは加算器５Ｒから入力された右
チャネル加算音声信号をＮ個の減算器26-JR(J=1,…,N)
へ分岐する。FIG. 2 shows the internal structure of the adder / branch unit 17-P.
3 will be described. The left and right channel audio signals input from each terminal selection unit 9 _P -J are given to the adders 5L and 5R, respectively, and also given to the delay units D-JL and D-JR, respectively. The adder 5L adds the input N left-channel audio signals and outputs the result to the branch point 6L. The adder 5R adds the input N right channel audio signals, and outputs a branch point 6R.
Output to The branch point 6L is the N-channel subtractor 26-JL (J = 1, ..., N) for the right channel addition audio signal input from the adder 5L.
Branch to. The branch point 6R is the N-channel subtractor 26-JR (J = 1, ..., N) for the right-channel added audio signal input from the adder 5R.
Branch to.

【００７３】遅延部D-JLは、端末選択部9_P-Jから入力さ
れた左チャネル音声信号を遅延時間τ_JLで遅延させ、減
算器26-JL へ出力する。ただし、遅延時間τ_JLは加算器
５Ｌにおける音声信号に対する処理に関わる遅延時間と
分岐点６Ｌにおいて音声信号に対する処理に関わる遅延
時間の和とする。従って、遅延部D-JLから出力された左
チャネル音声信号と、分岐点６Ｌから出力された左チャ
ネル加算音声信号中の端末選択部9_P-Jから出力された左
チャネル音声成分が同期する。遅延部D-JRの遅延時間τ
_JRも同様にして決められており、遅延部D-JRから出力さ
れた右チャネル音声信号と、分岐点６Ｒから出力された
右チャネル加算音声信号中の端末選択部9_P-Jから出力さ
れた右チャネル成分とが同期する。The delay unit D-JL delays the left channel audio signal input from the terminal selection unit 9 _P -J with a delay time τ _JL and outputs it to the subtractor 26-JL. However, the delay time τ _JL is the sum of the delay time related to the processing of the audio signal in the adder 5L and the delay time related to the processing of the audio signal at the branch point 6L. Therefore, the left channel audio signal output from the delay unit D-JL is synchronized with the left channel audio component output from the terminal selection unit 9 _P -J in the left channel added audio signal output from the branch point 6L. Delay time D-JR delay time τ
_{JR is} also determined in the same manner, and is output from the terminal selection unit 9 _P -J in the right channel audio signal output from the delay unit D-JR and the right channel addition audio signal output from the branch point 6R. Synchronize with the right channel component.

【００７４】減算器26-JL 及び26-JR は、それぞれ分岐
点６Ｌ及び６Ｒから与えられた音声信号から遅延部D-JL
及びD-JRから与えられた音声信号をそれぞれ減算する。
その結果、上記同様成分が相殺され、加算・分岐部17-P
においては、各端末TM-Jに対応するチャネルには他チャ
ネルＫ（Ｊ≠Ｋ）に由来する音声信号が加算結果が得ら
れる。その加算音声信号は図２２の断続部7-P へ出力さ
れる。つまり、各端末TM-Jからの音声信号は、自端末TM
-J宛に伝送される音声信号には加算されない。よって、
この実施例の音声通信制御装置１００が介在することに
起因する残響を抑制できる。The subtractors 26-JL and 26-JR are provided with delay units D-JL from the audio signals supplied from the branch points 6L and 6R, respectively.
And the audio signals given from D-JR are respectively subtracted.
As a result, the same components as above are canceled out, and the addition / branching unit 17-P
In, in the channel corresponding to each terminal TM-J, the addition result of the audio signal derived from the other channel K (J ≠ K) is obtained. The added voice signal is output to the interruption unit 7-P in FIG. In other words, the voice signal from each terminal TM-J is
-It is not added to the audio signal transmitted to J. Therefore,
Reverberation due to the presence of the voice communication control device 100 of this embodiment can be suppressed.

【００７５】図２２の説明に戻って、組合せ選択制御部
７Ｃは、信号処理制御部２０より伝達された通信会議Ｐ
の開始／終了情報に基づき、断続情報を決定する。この
断続情報は断続部7-P ヘ転送される。例えば、通信会議
Ｐを開始または終了するとき、端末選択部9_P-Jへ音声信
号をそれぞれ通過または遮断させることを指示する制御
信号が転送される。断続部7-P は、組合せ選択制御部７
Ｃからの制御信号に従って加算部・分岐部17-P，つまり
減算器26-JL、26-JRからの音声信号出力を一斉に断続す
る。Returning to the explanation of FIG. 22, the combination selection control unit 7C sends the communication conference P transmitted from the signal processing control unit 20.
The intermittent information is determined based on the start / end information of. This gating information is transferred to the gating section 7-P. For example, when starting or ending the communication conference P, a control signal instructing to pass or block the voice signal is transferred to the terminal selection unit 9 _P -J. The interruption unit 7-P is a combination selection control unit 7
In accordance with the control signal from C, the audio signal output from the adder / brancher 17-P, that is, the subtractors 26-JL and 26-JR is interrupted all together.

【００７６】加算部2-JL及び2-JRは、Ｑ個の端末組み合
わせに対応したＱ個の加算・分岐部17-P(P=1,…,Q)のそ
れぞれの第Ｊチャネルから断続部7-P (P=1,…,Q)により
選択された端末組み合わせP_Sの左右チャネル音声信号を
加算する。記号P_Sは互いに加算する端末組み合わせの番
号であり、0≦P_S≦Qの範囲から最大Ｑ個選択できる。こ
れによって選択された端末組み合わせの対応する左右チ
ャネル音声が加算され、端末TM-Jに送信されることにな
るので、端末TM-Jは複数の選択した端末組み合わせ（複
数の通信会議）に参加している全ての他端末からの音声
を音像定位して受聴することができる。また端末TM-Jか
ら送信された音声は、その端末TM-Jを含む端末組み合わ
せを選択した全ての他の端末に送信されることになる。
各多重・符号化部22-Jは加算部2-JL及び2-JRから入力さ
れた左右チャネル音声信号を多重化し、符号化する。即
ち、多重・符号化部22-Jは左右各２チャネルに相当する
ステレオ音声信号を各１チャネルに多重化し、符号化す
る。その結果、符号化された各１チャネルの多重化音声
信号は各端末TM-Jについて独立に交換部１１に出力さ
れ、各端末TM-J（1≦J≦M）へ向けて１チャネルに多重
化された音声信号が１回線で伝送される。The adders 2-JL and 2-JR are connected from the J-th channel of each of the Q adder / branchers 17-P (P = 1, ..., Q) corresponding to the Q terminal combinations to the intermittent part. Add the left and right channel audio signals of the terminal combination P _S selected by 7-P (P = 1, ..., Q). The symbol P _S is the number of terminal combinations to be added to each other, and up to Q can be selected from the range of 0 ≦ P _S ≦ Q. As a result, the left and right channel audio corresponding to the selected terminal combination is added and transmitted to the terminal TM-J, so the terminal TM-J participates in multiple selected terminal combinations (multiple communication conferences). It is possible to localize and listen to the sound from all other terminals. Further, the voice transmitted from the terminal TM-J will be transmitted to all other terminals that have selected the terminal combination including the terminal TM-J.
Each multiplexing / encoding unit 22-J multiplexes and encodes the left and right channel audio signals input from the adding units 2-JL and 2-JR. That is, the multiplexing / encoding unit 22-J multiplexes and encodes the stereo audio signals corresponding to the left and right two channels into each one channel. As a result, the coded multiplexed voice signal of each channel is independently output to the switching unit 11 for each terminal TM-J, and is multiplexed to one channel toward each terminal TM-J (1 ≦ J ≦ M). The converted voice signal is transmitted by one line.

【００７７】図２２の実施例によれば、例えば、図２１
Ｃに示すように仮に端末TM-1〜TM-6において通信会議が
行われている場合においても、受話者は各端末からの音
声を３６度間隔で各々異なる位置に定位して、端末TM-1
〜TM-3間に閉じた通信会議Ｘ及び端末TM-3〜TM-6に閉じ
た通信会議Ｙが同時に実現可能になる。このとき、端末
TM-3における受話者は、端末TM-1〜TM-2のみならず、端
末TM-4〜TM-6からの双方の音声を聴取できる。その他、
端末TM-1〜TM-6からなる全端末における通信会議が進行
中であっても、端末TM-1〜TM-3間における通信会議Ｘが
実現できる。なお、上述では図２１Ｃにおける端末TM-3
のように、ある端末が複数の通信会議Ｘ，Ｙに参加する
場合の動作について説明したが、例えば図２１Ｃ中の端
末TM-1のように１つの通信会議Ｘのみに参加する場合
は、端末TM-1に対応する例えば加算部2-1L、2-1Rに対
し、通信会議Ｘに対応する１つの断続部のみから音声信
号を通過させればよい。According to the embodiment of FIG. 22, for example, FIG.
As shown in C, even if a communication conference is being held at the terminals TM-1 to TM-6, the listener localizes the voice from each terminal at different positions at 36-degree intervals, and the terminal TM- 1
The communication conference X closed between TM3 and TM-3 and the communication conference Y closed between terminals TM-3 to TM-6 can be realized at the same time. At this time, the terminal
The listener in TM-3 can hear both voices not only from the terminals TM-1 to TM-2 but also from the terminals TM-4 to TM-6. Other,
Even if the communication conference in all terminals including the terminals TM-1 to TM-6 is in progress, the communication conference X between the terminals TM-1 to TM-3 can be realized. In the above, the terminal TM-3 in FIG. 21C is used.
As described above, the operation when a certain terminal participates in a plurality of communication conferences X and Y has been described. However, when participating in only one communication conference X, such as the terminal TM-1 in FIG. It suffices to pass the audio signal from only one interrupting unit corresponding to the communication conference X to the adding units 2-1L and 2-1R corresponding to TM-1.

【００７８】図２１Ｃに例示したように、全体会議中に
おける特定話者間の打ち合わせや、各通信会議に対する
監視等への他端末間音声通信の応用においても、受話者
は各端末での送話者の音声を各々異なった仮想空間位置
に定位して聴取できるようになる。結果的に、各送話者
の同定が容易になり、発話内容の了解度が向上する。本
実施例によれば仮想空間位置への音像定位を実現するた
めの音像処理部を各端末TM-Jにおいて導入する必要がな
くなるという経済的な利点が生じる。以上の効果から、
参加者全員が恰も同一空間内に在席する場合と同等な自
然な意志疎通が経済的に可能になる。As shown in FIG. 21C, in the discussion between specific speakers during the general conference, and in the application of voice communication between other terminals to monitoring for each communication conference, the listener transmits the speech at each terminal. The person's voice can be localized and heard in different virtual space positions. As a result, it becomes easy to identify each speaker, and the intelligibility of the utterance content is improved. According to the present embodiment, there is an economic advantage that it is not necessary to introduce a sound image processing unit for realizing sound image localization to a virtual space position in each terminal TM-J. From the above effects,
Economically possible natural communication equivalent to when all participants are present in the same space.

【００７９】この様に、図２２の実施例においては、Ｑ
個の通信会議に対応してＱ個の加算・分岐部17-1〜17-Q
が設けられており、各端末TM-Jからその端末の利用者が
参加（発言）しようとする１つ又は複数の通信会議を指
定する制御信号を信号処理制御部２０が受信すると、そ
の制御信号を端末選択制御部９Ｃに与える。端末選択制
御部９Ｃはその端末TM-Jからの音声信号に対するＱ個の
端末選択部9P-J(P=1,…,Q)のうち、制御信号により指定
された通信会議に対応する加算・分岐部17-Pに対する１
つ又は複数の端末選択部9P-Jを導通させる。これによっ
て端末TM-Jからの音声信号をその端末が要求した１つ又
は複数の通信会議に接続することができ、その端末TM-J
の利用者はその通信会議に音声を送ることができる。ま
た、図２２の実施例ではＱ個の加算・分岐部17-1〜17-Q
の出力側にＱ個の断続部7-1〜7-Qが設けられており、各
端末TM-Jからその端末の利用者が受聴しようとする１つ
又は複数の通信会議を指定する制御信号を信号処理制御
部２０が受信すると、その制御信号を組合せ選択制御部
７Ｃに与える。組合せ選択制御部７Ｃはその制御信号に
より指定された１つ又は複数の加算・分岐部に対応する
断続部7-P を導通とすることにより、指定された１つ又
は複数の通信会議の内容がその端末TM-Jに送信される。Thus, in the embodiment of FIG. 22, Q
Q addition / branching units 17-1 to 17-Q corresponding to one communication conference
Is provided, and when the signal processing control unit 20 receives from each terminal TM-J a control signal designating one or more communication conferences in which the user of the terminal intends to participate (speak), the control signal To the terminal selection control unit 9C. The terminal selection control unit 9C, among the Q number of terminal selection units 9P-J (P = 1, ..., Q) for the voice signal from the terminal TM-J, performs addition / correspondence corresponding to the communication conference designated by the control signal. 1 for branch 17-P
One or a plurality of terminal selection units 9P-J are brought into conduction. As a result, the voice signal from the terminal TM-J can be connected to one or more communication conferences requested by the terminal.
Users can send audio to the teleconference. Further, in the embodiment shown in FIG. 22, there are Q addition / branching units 17-1 to 17-Q.
Is provided with Q connecting / disconnecting units 7-1 to 7-Q, and a control signal for designating one or a plurality of communication conferences to be heard by a user of the terminal from each terminal TM-J. Is received by the signal processing control unit 20, the control signal is given to the combination selection control unit 7C. The combination selection control unit 7C turns on and off the interrupting unit 7-P corresponding to the one or more adding / branching units designated by the control signal, so that the contents of the designated one or more communication conferences are changed. It is sent to the terminal TM-J.

【００８０】従って、各端末TM-Jは任意の時点で制御信
号をこの発明の音声通信制御装置に送ることによって、
参加（発言）する通信の変更、追加、削除、離脱が可能
であり、また受聴する通信（会議）の変更、追加、削
除、離脱が可能である。図２２の音声通信制御装置１０
０において各端末TM-Jに対し各２チャネル音声信号（ス
テレオ音声信号）を１回線で送出する代わりに２回線の
通信回線を使用して送出してもよい。その場合、各チャ
ネルの音声信号につき各１回線使用し、交換部１１は各
端末TM-Jについて入出力合わせて各３回線交換する必要
がある。この場合、多重・符号化部22-Jにおける多重化
及び各端末TM-Jにおける多重分離が不要となる。しか
し、このように構成する場合には多重・符号化部22-Jは
左右チャネル用に２個必要となり、構成を複雑にすると
いう不都合が生じる。第６実施例図２４は図２０の原理的構成で示す実施例の変形例の原
理的構成を示す。利用条件は図２０、図２２に示した実
施例と同一である。図２４に示す実施例の音声通信制御
装置では、端末端末組合せ割当部１９を音像処理部8-J
より前段に設けた点が図２０の実施例と異なるだけで、
他の構成は図２０の実施例と同じである。端末端末組合
せ割当部１９を音像処理部8-J より前段側に設けたこと
により、接続端末間の組合せが決定された後に、音像定
位のための音声信号処理を施す。そのため、図２６Ａ，
２６Ｂの例のように各通信会議ＸまたはＹ毎に各端末TM
-1〜TM-3或いはTM-3〜TM-6の音像位置を自由に設定する
ことが可能となる。第７実施例図２５に図２４の原理的な構成で示された実施例の具体
的な構成例を示す。図２２と対応する部分には同一符号
を付して示す。この実施例の構成及び機能のうち大部分
は図２２の実施例と類似する。この実施例においても各
端末は複数の通信会議に同時に参加可能であり、また各
端末における受話者が他端末の送話者による音声を各々
異なった位置に定位する任意多地点間音声通信が可能に
なる。更に、仮想空間位置への音像定位を実現する音像
処理部を各端末TM-Jまたは各端末組み合わせＰにおいて
導入する必要がない点も図２２の実施例と同じである。
図２２の実施例との相違点を中心に説明する。以下、各
部分について述べる。Therefore, each terminal TM-J sends a control signal to the voice communication control device of the present invention at any time,
It is possible to change, add, delete, and leave the communication to participate (speak), and to change, add, delete, and leave the communication (conference) to be heard. The voice communication control device 10 of FIG.
At 0, instead of sending each two-channel audio signal (stereo audio signal) to each terminal TM-J by one line, it may be sent by using two communication lines. In that case, one line is used for each voice signal of each channel, and the switching unit 11 needs to switch three lines for each terminal TM-J for input and output. In this case, multiplexing in the multiplexing / encoding unit 22-J and demultiplexing in each terminal TM-J are unnecessary. However, in the case of such a configuration, two multiplexing / encoding units 22-J are required for the left and right channels, which causes a disadvantage of complicating the configuration. Sixth Embodiment FIG. 24 shows a principle configuration of a modification of the embodiment shown in the principle configuration of FIG. The usage conditions are the same as those in the embodiment shown in FIGS. In the voice communication control device of the embodiment shown in FIG. 24, the terminal / terminal combination assigning unit 19 is used as the sound image processing unit 8-J.
20 is different from the embodiment of FIG.
The other structure is the same as that of the embodiment of FIG. By providing the terminal / terminal combination assigning unit 19 on the upstream side of the sound image processing unit 8-J, audio signal processing for sound image localization is performed after the combination between connected terminals is determined. Therefore, in FIG.
26B, each terminal TM for each communication conference X or Y
It is possible to freely set the sound image position of -1 to TM-3 or TM-3 to TM-6. Seventh Embodiment FIG. 25 shows a specific example of the configuration of the example shown in the principle configuration of FIG. The parts corresponding to those in FIG. 22 are designated by the same reference numerals. Most of the configuration and functions of this embodiment are similar to the embodiment of FIG. Also in this embodiment, each terminal can participate in a plurality of communication conferences at the same time, and the listener at each terminal can perform arbitrary multipoint voice communication in which the voices of the talkers of other terminals are localized at different positions. become. Furthermore, it is the same as the embodiment of FIG. 22 in that it is not necessary to introduce a sound image processing unit for realizing sound image localization to the virtual space position in each terminal TM-J or each terminal combination P.
The difference from the embodiment of FIG. 22 will be mainly described. Each part will be described below.

【００８１】信号処理制御部２０は各地点から交換部１
１を介して伝達された通信開始／終了、接続確認、各接
続端末TM-Jの組合せ間における通信会議Ｐへの所属、等
の制御信号を受信する。これらの制御信号より、接続端
末TM-J、接続端末数Ｍ，通信開始／終了、接続端末組み
合わせ間の通信会議Ｐへの所属、各通信会議Ｐへの所属
端末数を検知する。また、検知された接続端末及び接続
端末数Ｍを増幅率設定部３５へ，通信開始／終了を端末
選択制御部９Ｃ及び組合せ決定部７Ｃへ、各接続端末TM
-Jの通信会議Ｐへの所属を端末選択制御部９Ｃへ，各通
信会議Ｐへの所属端末数Ｍ_Pをパラメータ設定部１４Ｃ
へ伝達する。The signal processing control unit 20 operates from each point to the exchange unit 1.
Control signals such as communication start / end, connection confirmation, belonging to the communication conference P between the combinations of the connection terminals TM-J, etc. transmitted via 1 are received. From these control signals, the connection terminal TM-J, the number M of connection terminals, the communication start / end, the affiliation to the communication conference P between the combinations of connection terminals, and the number of terminals belonging to each communication conference P are detected. Further, the detected connected terminals and the number of connected terminals M are sent to the amplification factor setting unit 35, the communication start / end is sent to the terminal selection control unit 9C and the combination determination unit 7C, and each connected terminal TM
-J belongs to the communication conference P to the terminal selection control unit 9C, and the number of terminals M _P belonging to each communication conference P to the parameter setting unit 14C
Communicate to.

【００８２】パラメータ設定部１４Ｃは、各端末組み合
わせＰ毎にその組み合わせＰ内の全端末TM_P-P からの音
声信号に対してそれらの再生音を各々異なる仮想空間位
置θ _PJに定位させる音響伝達関数H_L(θ_PJ)，H_R(θ_PJ)を
それぞれ音像処理部8P-Jに設定する。ここで、仮想空間
位置θ_PJを決定すればその音響伝達関数H_L(θ_PJ)，H
_R(θ_PJ)を設定可能になる。この実施例では、信号処理
制御部２０において検知された通信会議Ｐへの所属端末
数Ｍ_Pに基づいて各地点から伝送された音声に対する仮
想空間位置θ_PJを決定する。図２１Ｃに例示したよう
に、仮想空間位置θ_PJを水平面上に左側方(90°)から正
面(0°) を通り、右側方(-90°) まで等角度間隔180/(M
_P-1)度で決定する。このとき、通信会議Ｐに属する端末
TM-Jに対する番号を順にＪ_P(1≦J_P≦M_P) としたとき、
前述と同様に仮想空間位置θ_PJは90-180(J_P-1)/(M-1)度
と定められる。The parameter setting unit 14C is used to combine each terminal.
All terminals in each combination P for each combination P_P-Sound from P
These reproduced sounds are different in virtual space position with respect to the voice signal.
Place θ _PJAcoustic transfer function H_L(θ_PJ), H_R(θ_PJ)
Each is set in the sound image processing unit 8P-J. Where virtual space
Position θ_PJ, Then the acoustic transfer function H_L(θ_PJ), H
_R(θ_PJ) Can be set. In this example, the signal processing
A terminal belonging to the communication conference P detected by the control unit 20
Number M_PBased on the
Idea space position θ_PJTo determine. As illustrated in Figure 21C
, The virtual space position θ_PJOn the horizontal plane from the left side (90 °)
180 / (M
_P-1) Determine in degrees. At this time, terminals belonging to the communication conference P
Number for TM-J in order of J_P(1≤J_P≤ M_P),
The virtual space position θ_PJ90-180 (J_P-1) / (M-1) degree
Is determined.

【００８３】各端末TM-Jに由来する１チャネルの音声信
号はＱ個の端末選択部9_P-J(P=1,…,Q)へ分配される。各
端末選択部9_P-Jは端末選択制御部９Ｃからの制御信号に
従って各１チャネルの音声信号を断続し、端末選択部9_P
-Jを通過した音声信号は対応する音像処理部8_P-Jに与え
られる。各音像処理部8_P-Jは、端末選択部9_P-Jから出力
された音声信号にパラメータ設定部１４Ｃにおいて設定
された音響伝達関数H_L(θ_PJ)，H_R(θ_PJ)を各々畳み込み
演算して２チャネル音声信号とし、加算・分岐部17-Pに
与える。音像処理部8_P-Jは各端末の組Ｐ毎にＮ個(J=1,
…,N)設けられており、これに対し図２２では音像処理
部8-J はＮ個設けられているだけである。また図２２の
各端末選択部9_P-Jは２チャネルの音声信号を断続する点
もこの図２５の実施例と異なる。図２５における各加算
・分岐部17-Pの構成及び動作は図２３に示したものと全
く同じである。The one-channel audio signal originating from each terminal TM-J is distributed to the Q terminal selecting units 9 _P -J (P = 1, ..., Q). Each terminal selection unit 9 _P -J is intermittently audio signal of each 1-channel in accordance with a control signal from the terminal selection control unit 9C, the terminal selecting unit 9 _P
The audio signal passing through -J is given to the corresponding sound image processing unit 8 _P -J. The sound image processing units 8 _P -J respectively apply the acoustic transfer functions H _L (θ _PJ ), H _R (θ _PJ ) set in the parameter setting unit 14 C to the audio signals output from the terminal selection unit 9 _P -J. The convolution operation is performed to obtain a two-channel audio signal, which is given to the adding / branching unit 17-P. There are N sound image processing units 8 _P -J for each terminal group P (J = 1,
, N), whereas in FIG. 22, only N sound image processing units 8-J are provided. Also, each terminal selecting unit 9 _P -J of FIG. 22 is different from the embodiment of FIG. 25 in that two-channel audio signals are intermittently connected. The configuration and operation of each adding / branching unit 17-P in FIG. 25 are exactly the same as those shown in FIG.

【００８４】この図２５の実施例と、図２２の実施例と
では音声信号に対する処理の順序が異なる。図２５の実
施例では、各端末TM-Jからの１チャネルの音声信号が各
端末組み合わせＰ(P=1,…,Q)毎に組分けられた後で、各
通信会議Ｐ毎に独立に音像定位を目的とした２チャネル
の音声信号を生成する。従って各通信会議Ｐ毎に独立に
その通信会議に属するそれぞれの端末TM-Jに異なる仮想
空間位置θ_PJを設定することが許される。つまり、パラ
メータ設定部１４Ｃにおいて受話者に各通信会議Ｐに属
するそれぞれの端末TM-J毎に異なった仮想空間位置θ_PJ
に音声を定位させるための音像制御パラメータとして音
響伝達関数H_L(θ_PJ)，H_R(θ_PJ)を設定することができ
る。The embodiment of FIG. 25 and the embodiment of FIG. 22 differ in the order of processing for audio signals. In the embodiment of FIG. 25, one-channel audio signal from each terminal TM-J is grouped for each terminal combination P (P = 1, ..., Q) and then independently for each communication conference P. A two-channel audio signal for the purpose of sound image localization is generated. Therefore, for each communication conference P, it is allowed to set a different virtual space position θ _PJ to each terminal TM-J belonging to the communication conference independently. That is, in the parameter setting unit 14C, the virtual space position θ _PJ that is different for each terminal TM-J belonging to each communication conference P is given to the _listener.
The acoustic transfer functions H _L (θ _PJ ) and H _R (θ _PJ ) can be set as sound image control parameters for localizing the voice.

【００８５】ここで、各通信会議Ｐにおいてそれぞれの
端末TM-Jからの音声に対する仮想空間位置間隔を拡大す
る方法として、通信会議Ｐに所属する端末数Ｍ_Pから決
定する方法を示す。一例として、図２６Ａ，２６Ｂに示
す端末組み合わせに適用する。図２６Ａに示すように、
通信会議Ｘは３端末間において行われるので、仮想空間
位置間隔が９０°となる。端末TM-1，TM-2，TM-3に対す
る仮想空間位置は順に左側方(90°)，前方(0°) ，右側
方(-90°) に分布する。通信会議Ｙは４端末間において
行われ、仮想空間位置間隔が60°となる。端末TM-3，TM
-4，TM-5，TM-6に対する仮想空間位置は順に左側方(90
°)，左前方(30°)，左前方(-30°)，右側方(-90°)に
分布する。Here, as a method of expanding the virtual space position interval for the voice from each terminal TM-J in each communication conference P, a method of determining from the number M _{P of} terminals belonging to the communication conference P will be shown. As an example, it is applied to the terminal combinations shown in FIGS. 26A and 26B. As shown in FIG. 26A,
Since the communication conference X is held between the three terminals, the virtual space position interval is 90 °. The virtual space positions for terminals TM-1, TM-2 and TM-3 are distributed to the left (90 °), front (0 °) and right (-90 °) in order. The communication conference Y is held between the four terminals, and the virtual space position interval is 60 °. Terminal TM-3, TM
The virtual space positions for -4, TM-5, and TM-6 are to the left (90
), Left front (30 °), left front (-30 °), right side (-90 °).

【００８６】比較のため、全接続端末数Ｍを用いて仮想
空間位置を決定した場合を考える。全接続端末数Ｍは６
であるから、仮想空間位置間隔は３６°となり、図２１
Ｃに示されるよう、端末TM-1〜TM-6に対する仮想空間位
置は、順に左側方(90°)，左後方(54°)，左前方(18
°)，右前方(-18°)，右前方(-54°)，右側方(-90°)
に分布する。図２１Ａ，２１Ｂの実施例で音像定位のた
めの処理を施した後で端末組み合わせ割当を実施するか
ら、各通信会議Ｐ毎に独自に仮想空間位置を設定するこ
とができない。従って、一端末からの音声に対する定位
位置は端末組み合わせによらず一定である。このとき、
図２１Ａ，２１Ｂに示されるよう、端末TM-1〜TM-3から
なる通信会議Ｘについては定位位置分布は左側方(+90
°) から左前方(+18°) に限られる。端末TM-3〜TM-6か
らなる通信会議Ｙについては定位位置分布は左前方(+18
°) から右側方(-90°) となる。For comparison, consider the case where the virtual space position is determined using the total number M of connected terminals. The total number of connected terminals M is 6
Therefore, the virtual space position interval is 36 °, and FIG.
As shown in C, the virtual space positions for the terminals TM-1 to TM-6 are sequentially left side (90 °), left rear (54 °), left front (18).
°), right front (-18 °), right front (-54 °), right side (-90 °)
Distributed in. 21A and 21B, since the terminal combination allocation is performed after performing the processing for sound localization, it is not possible to set the virtual space position independently for each communication conference P. Therefore, the localization position for the voice from one terminal is constant regardless of the combination of terminals. At this time,
As shown in FIGS. 21A and 21B, for the communication conference X including the terminals TM-1 to TM-3, the localization position distribution is to the left (+90).
Only from (°) to the front left (+ 18 °). For the communication conference Y consisting of the terminals TM-3 to TM-6, the localization position distribution is left front (+18
From (°) to the right (-90 °).

【００８７】この様に、図２４及び２５の実施例は端末
組み合わせ間における各通信会議毎に定位位置を自由に
設定することができる。また、各端末組み合わせ（即ち
通信会議）Ｐに属する端末数Ｍ_Pにより定位位置を設定
することにより、各端末からの音声に対する定位位置の
分布及び間隔を図２０及び２２の実施例よりも拡大でき
る。受話者が各端末からの音声を聴取するとき、図２０
及び２２の実施例よりも各送話者の同定が容易になり、
発話内容の了解度が更に向上する。As described above, in the embodiments of FIGS. 24 and 25, the localization position can be freely set for each communication conference between the terminal combinations. Further, by setting the localization position according to the number of terminals M _P belonging to each terminal combination (that is, communication conference) P, the distribution and interval of the localization positions with respect to the voice from each terminal can be expanded as compared with the embodiments of FIGS. 20 and 22. . When the listener listens to the sound from each terminal, FIG.
It becomes easier to identify each talker than the embodiment of
The intelligibility of the utterance content is further improved.

【００８８】図２５の実施例によれば、任意の通信会議
への参加端末数が変化した場合、その通信会議の参加端
末に対する音像定位位置を変更することができる。その
場合、その通信会議の全参加端末に対する音像位置を音
声通信制御装置１００の信号処理制御部２０がそれぞれ
の参加端末数に応じて予め決めたそれぞれの端末に対す
る音像定位位置の配置モデル（音響伝達関数H_L(J),H
_R(J) の組）に従って決めることができる。即ち、会議
からの離脱要求や参加要求により参加者の数Ｍが変化し
た場合は、それぞれの参加者に割り当てる仮想空間位置
を更新された参加者数に対応する配置モデルを参照して
新たに決定し、それに従って対応する組の音響伝達関数
H_L(J),H_R(J) をそれぞれ選択し、各音像処理部8-J に設
定する。通信会議の初期手順として参加数に対応して取
り得る位置を予め決め、それらの位置に対する参加者の
割当を参加者に選択させることもできる。According to the embodiment shown in FIG. 25, when the number of terminals participating in an arbitrary communication conference changes, the sound image localization position for the terminals participating in the communication conference can be changed. In this case, the signal processing control unit 20 of the voice communication control device 100 predetermines the sound image positions for all the participating terminals of the communication conference in accordance with the number of each participating terminal, and the placement model of the sound image localization position for each terminal (acoustic transmission). Function H _L (J), H
_R (J) group). That is, when the number M of participants changes due to a request to leave the conference or a request to participate, the virtual space position to be assigned to each participant is newly determined by referring to the placement model corresponding to the updated number of participants. And correspondingly the corresponding set of acoustic transfer functions
H _L (J), H _R and (J) respectively selected, set for each sound image processing unit 8-J. It is also possible to predetermine possible positions corresponding to the number of participants as an initial procedure of the communication conference and allow the participants to select the allocation of participants to those positions.

【００８９】図２２及び２５の各実施例において、任意
の通信会議の初期設定としてパラメータ設定部１４Ｃに
より音像制御パラメータを設定する場合、端末からの参
加要求信号に基づいてその通信会議の参加端末数Ｍが検
出されると、それら参加端末に割り当てることができる
仮想空間位置を信号処理制御部２０において計算により
求める。上述ではそれら求められた仮想空間位置のそれ
ぞれをパラメータ設定部１４Ｃがどの参加者に割り当て
るか決定した場合で説明したが、通信会議の行われてい
る間において、どの参加者に対しても自分に割り当てら
れている現在の仮想空間位置を他の所望の仮想空間位置
に変更できるようにすることもできる。例えば、音声通
信制御装置１００はある通信会議の全参加者に対する最
初に決定した仮想空間位置情報を予め全参加端末に送信
しておく。通信会議中においてある端末利用者が自分の
現在位置を希望の位置に変更したい場合、その希望位置
を示す変更要求信号を音声通信制御装置１００に送信す
る。音声通信制御装置１００の信号処理制御部２０は受
信した位置変更要求信号に基づいて、例えば要求元の現
位置と希望位置に対する現在の参加者割当を入れ替え
て、新しい割当情報を全参加者に通知する。In each of the embodiments shown in FIGS. 22 and 25, when the sound image control parameter is set by the parameter setting unit 14C as the initial setting of an arbitrary communication conference, the number of terminals participating in the communication conference based on the participation request signal from the terminal. When M is detected, the signal processing control unit 20 calculates the virtual space positions that can be assigned to these participating terminals. In the above description, the parameter setting unit 14C determines to which participant each of the obtained virtual space positions is to be assigned. However, while the communication conference is being performed, all the participants are assigned to themselves. It is also possible to change the assigned current virtual space position to another desired virtual space position. For example, the voice communication control device 100 transmits the virtual space position information initially determined for all the participants of a communication conference to all the participating terminals in advance. When a terminal user wants to change his or her current position to a desired position during a communication conference, a change request signal indicating the desired position is transmitted to the voice communication control device 100. Based on the received position change request signal, the signal processing control unit 20 of the voice communication control device 100 exchanges, for example, the current participant allocations for the requesting current position and the desired position, and notifies all participants of the new allocation information. To do.

【００９０】前述した図８及び１４の実施例において、
音声検出処理部２３Ｂの出力側の各チャネルに破線で示
すようにスイッチSW-1〜SW-Nを直列に挿入し、音声検出
処理部２３Ｂにより各チャネルが発話状態であるか否か
判定し、発話期間のみそのチャネルのスイッチSW-JをＯ
Ｎとし、それ以外はＯＦＦとする事により、そのチャネ
ルに入力される雑音を遮断するようにしてもよい。発話
状態の検出は図８で説明したように音声信号の短時間パ
ワー積分値が閾値Ｅ_ON以上であるか否かによって判定で
きる。同様に図１６、２２、２５の各実施例において
も、復号化部23-1〜23-Nの出力側に破線で示すようにス
イッチSW-1〜SW-Nをそれぞれ挿入し、増幅率設定部３５
で各チャネルの音声信号から発話状態か否かを判定し、
発話状態の期間のみスイッチをＯＮとしてもよい。In the embodiment of FIGS. 8 and 14 described above,
The switches SW-1 to SW-N are inserted in series in each channel on the output side of the voice detection processing unit 23B as shown by the broken line, and the voice detection processing unit 23B determines whether each channel is in the utterance state, Turn on the switch SW-J for that channel only during the utterance period.
Noise input to the channel may be blocked by setting N and turning OFF in other cases. The detection of the utterance state can be determined by whether or not the short-time power integrated value of the audio signal is equal to or more than the threshold value E _ON , as described with reference to FIG. Similarly, in each of the embodiments of FIGS. 16, 22, and 25, the switches SW-1 to SW-N are inserted on the output side of the decoding units 23-1 to 23-N as shown by broken lines to set the amplification factor. Part 35
Determines whether or not it is in the utterance state from the audio signal of each channel,
The switch may be turned on only during the utterance state.

【００９１】[0091]

【発明の効果】以上詳細に説明したように、この発明の
音声通信制御装置によれば、各端末からの音声信号を複
数チャネルに分岐し、分岐チャネル毎にそれぞれの端末
からの分岐音声信号を加算して複数チャネルの加算音声
信号を生成し、その複数チャネル加算音声信号をそれぞ
れの端末に分岐して送信するので、各端末において特別
な音像定位処理を行わないでも少なくとも１人の通信会
議参加者の音像を他の参加者の音声の音像と区別して再
生することができる。As described above in detail, according to the voice communication control device of the present invention, the voice signal from each terminal is branched into a plurality of channels, and the branch voice signal from each terminal is branched for each branch channel. Since the addition audio signals of multiple channels are added and the added audio signals of the multiple channels are branched and transmitted to each terminal, at least one person participates in the communication conference without special sound image localization processing at each terminal. The person's sound image can be reproduced by distinguishing it from the sound images of the voices of other participants.

[Brief description of drawings]

【図１】音像定位を目的とした音響伝達関数を説明する
ための図。FIG. 1 is a diagram for explaining an acoustic transfer function for the purpose of sound image localization.

【図２】音像定位を目的とした音声信号処理の例を説明
するための図。FIG. 2 is a diagram for explaining an example of audio signal processing for the purpose of sound image localization.

【図３】従来の多地点間音声通信用端末の構成例を示す
ブロック図。FIG. 3 is a block diagram showing a configuration example of a conventional multipoint voice communication terminal.

【図４】従来の多地点間音声通信における通信回線網構
成例を示すブロック図。FIG. 4 is a block diagram showing a configuration example of a communication line network in conventional multipoint voice communication.

【図５】この発明の音声通信制御装置１００が使用され
る通信回線の収容例を示すブロック図。FIG. 5 is a block diagram showing an example of accommodating a communication line in which the voice communication control device 100 of the present invention is used.

【図６】この発明の音声通信制御装置の基本的構成を示
すブロック図。FIG. 6 is a block diagram showing a basic configuration of a voice communication control device of the present invention.

【図７】図５のシステムにおいて使用される各端末の構
成例を示すブロック図。7 is a block diagram showing a configuration example of each terminal used in the system of FIG.

【図８】この発明による第１の実施例の音声通信制御装
置を示すブロック図。FIG. 8 is a block diagram showing a voice communication control device according to a first embodiment of the present invention.

【図９】発言識別方法を説明するための図。FIG. 9 is a diagram for explaining a message identification method.

【図１０】図８における音声信号処理部２５の構成例を
示すブロック図。10 is a block diagram showing a configuration example of an audio signal processing section 25 in FIG.

【図１１】図８における音声信号処理部２５の他の構成
例を示すブロック図。11 is a block diagram showing another configuration example of the audio signal processing unit 25 in FIG.

【図１２】第１の主話者識別方法と図１１の動作例を説
明するためのタイムチャート。FIG. 12 is a time chart for explaining a first main speaker identifying method and an operation example of FIG. 11.

【図１３】第２の主話者識別方法と図１１の動作例を説
明するためのタイムチャート。13 is a time chart for explaining a second main speaker identifying method and an operation example of FIG. 11.

【図１４】この発明による第２の実施例の音声通信制御
装置を示すブロック図。FIG. 14 is a block diagram showing a voice communication control device according to a second embodiment of the present invention.

【図１５】図１４における会議選択部の構成例を示すブ
ロック図。15 is a block diagram showing a configuration example of a conference selection unit in FIG.

【図１６】この発明による第３の実施例の音声通信制御
装置を示すブロック図。FIG. 16 is a block diagram showing a voice communication control device according to a third embodiment of the present invention.

【図１７】図１６における音像処理部8-1 の構成例を示
すブロック図。17 is a block diagram showing a configuration example of a sound image processing unit 8-1 in FIG.

【図１８】音像定位位置を説明するための図。FIG. 18 is a diagram for explaining a sound image localization position.

【図１９】複数の各通信会議に属す端末組み合わせを説
明するための図。FIG. 19 is a diagram for explaining a combination of terminals belonging to each of a plurality of communication conferences.

【図２０】この発明による第４の実施例の音声通信制御
装置を示すブロック図。FIG. 20 is a block diagram showing a voice communication control device according to a fourth embodiment of the present invention.

【図２１】Ａは図２０の実施例による１つの通信会議に
おける音像定位位置例を示す図、Ｂは図２０の実施例に
よるもう１つの通信会議における音像定位位置例を示す
図、Ｃは図２０の実施例による２つの通信会議に参加し
た場合の音像定位位置を示す図。21A is a diagram showing a sound image localization position example in one communication conference according to the embodiment of FIG. 20, B is a diagram showing a sound image localization position example in another communication conference according to the embodiment of FIG. 20, and FIG. The figure which shows the sound image localization position at the time of participating in two communication conferences by the Example of 20.

【図２２】図２０の実施例の具体的構成例を示す第５実
施例のブロック図。22 is a block diagram of a fifth embodiment showing a specific configuration example of the embodiment of FIG.

【図２３】図２２における各加算・分岐部17-Pの構成例
を示すブロック図。23 is a block diagram showing a configuration example of each adding / branching unit 17-P in FIG. 22.

【図２４】図２０の変形実施例を示す第６実施例のブロ
ック図。FIG. 24 is a block diagram of a sixth embodiment showing a modification of FIG. 20.

【図２５】図２４の実施例の具体的構成例を示す第７実
施例のブロック図。FIG. 25 is a block diagram of a seventh embodiment showing a specific configuration example of the embodiment of FIG. 24.

【図２６】Ａは図２４、２５の実施例により１つの通信
会議において可能な音像定位位置例を示す図、Ｂは図２
４、２５の実施例により他の通信会議において可能な音
像定位位置例を示す図。26A is a diagram showing an example of a sound image localization position possible in one communication conference according to the embodiments of FIGS. 24 and 25, and FIG.
The figure which shows the example of a sound image localization position possible in another communication conference by the Example of 4 and 25.

───────────────────────────────────────────────────── フロントページの続き (72)発明者林伸夫東京都千代田区内幸町１丁目１番６号日本電信電話株式会社内 ─────────────────────────────────────────────────── ─── Continued Front Page (72) Inventor Nobuo Hayashi 1-1-6 Uchisaiwaicho, Chiyoda-ku, Tokyo Nihon Telegraph and Telephone Corporation

Claims

[Claims]

1. A voice communication control device for connecting a communication conference with at least three terminals through a communication network, and a switching section for exchanging a voice signal received from the terminals via the communication network. , N input channels which are connected to the exchange unit and into which audio signals from the respective terminals are input, and N is an integer of 3 or more, and the input audio signals from the N input channels are respectively input. A channel branching unit for branching into branch audio signals of K branch channels, and K is an integer of 2 or more, each of which has a predetermined type for the branch audio signals of the channels corresponding to the N input channels. A sound image control unit that performs processing using N parameter groups that respectively include K sound image control parameters, and that generates a sound image control audio signal for each K branch channel; At least one of the data groups differs from the other parameter groups in accordance with the target position of the sound image, and the sound image control audio signal of the K branch channel corresponding to each of the N input channels is output for each branch channel. And an addition unit for generating a K channel added voice signal, and a terminal corresponding branching unit that distributes the K channel added voice signal corresponding to each of the terminals and supplies the divided voice signal to the exchange unit.

2. The voice communication control device according to claim 1,
Input voice signals that are inserted between the N input channels and the channel branching unit and are added to each other from the input voice signals input to the N input channels through the exchanging unit are selected and output. It includes a speaker selecting unit and N selected audio channels for giving the input audio signals selected and output by the speaker selecting unit to the channel branching unit, and input to the N input channels through the exchanging unit. Output the input speech signal to be processed by the at least one parameter group from each said terminal to a predetermined one of said N selected speech channels, and the other said input signal to the remaining one. A signal processing control unit that controls the speaker selection unit to output to the selected voice channel.

3. The voice communication control device according to claim 1, wherein the voice signal of the main speaker and the voice signals of other speakers are assigned the highest priority and the other ones of the input voice signals of the respective input channels. The channel branching unit has a branching number K of 2, and the channel branching unit converts the input audio signal given from each input channel into a two-channel audio signal. The audio image control unit processes the branched audio signals of the two channels corresponding to the audio signal of the main speaker so that they are in phase with each other, and outputs the branched audio signals of the K-branched channel. It includes a phase control unit that processes the two-channel branched voice signals corresponding to the voice signal of the speaker so that they have opposite phases.

4. The voice communication control device according to claim 2,
The number of branches K of the channel branching unit is 2, and the signal processing control unit has the N units from the respective terminals.
From the individual input voice signals, the one with the highest priority and the other one are discriminated as the voice signal of the main speaker and the voice signal of other speakers, and the speaker selection unit determines the discrimination result by the signal processing control unit. According to the above, the main speaker's voice signal and the other speaker's voice signals are output to the predetermined one of the N selected voice channels and the other ones, respectively, and given to the channel branching unit. The branching unit branches the given N input audio signals respectively and outputs the first and second branch channel audio signals as the branched audio signals of the K branch channel, and the sound image control unit is the signal processing control unit. So that the first and second branch channel voice signals corresponding to the voice signal of the main speaker from the predetermined one selected voice channel are in phase with each other under the control of And a phase control unit that sets the first and second branch channel audio signals corresponding to the audio signals of the other speakers from the other selected audio channels so as to have opposite phases to each other. .

5. The voice communication control device according to claim 3 or 4, wherein the sound image control unit compares with the level of the voice signal of the main speaker according to the control of the signal processing control unit, and the voices of the other speakers. An attenuator for reducing the signal level is included.

6. The voice communication control device according to claim 1, wherein the number of branches K of the channel branching unit is 2, and the channel branching unit branches each of the given N input voice signals into a first branch. And the second branch channel audio signal to the above K
The signal processing control unit outputs the branched voice signal of the branch channel, and the signal processing control unit outputs the N input voice signals having the highest priority among the N input voice signals of the respective input channels and those other than the voice signal of the main speaker and the others. , And the signal processing control unit provides a sufficiently large amount of attenuation to the second channel branch audio signal compared to the first channel branch audio signal corresponding to the main speaker audio signal. The parameter section is provided to the sound image control section so as to give a sufficiently large amount of attenuation to the first channel branch audio signal as compared with the second channel branch audio signal corresponding to the respective voice signals of the other speakers. Set to.

7. The voice communication control device according to claim 2,
The number of branches K of the channel branching unit is 2, and the signal processing control unit determines from the N input voice signals from the respective input channels as the voice signal of the main speaker and the voice signal of other speakers. The speaker selecting unit outputs the main speaker's voice signal and the other speaker's voice signals to the predetermined one of the plurality of selected voice channels and the other, respectively, and respectively outputs the channel branching unit. The channel branching unit branches the given N input voice signals, respectively, and outputs first and second branch channel voice signals as branch voice signals of the K branch channel. Is assigned to the second channel branch audio signal as compared to the first channel branch audio signal corresponding to the main speaker's voice signal from the predetermined one selected voice channel. The sound image control of the parameter group so as to give a sufficiently large amount of attenuation and to give a sufficiently large amount of attenuation to the first channel branch audio signal as compared with the second channel branch audio signal from the other selected audio channels. Set to the department.

8. The voice communication control device according to claim 2,
Monitoring the level of the input voice signal through the input channel corresponding to each of the terminals from the exchange,
A voice detection processing unit that detects the presence or absence of speech of the terminal based on the monitoring result is provided, and the signal processing control unit determines the main speaker based on the presence or absence of the speech detected by the voice detection processing unit. The speaker selection unit is controlled based on the determination result.

9. The voice communication control device according to claim 3, wherein the level of the input voice signal passing through the input channel corresponding to each of the terminals from the switching unit is monitored, and based on the monitoring result. A voice detection processing unit that detects the presence or absence of speech of the terminal is provided, and the signal processing control unit determines the input channel of the main speaker based on the presence or absence of the speech detected by the voice detection processing unit, and makes the determination. The parameter group used for the signal processing of the sound image control unit is updated based on the result.

10. The voice communication control device according to claim 2, wherein Q number of voice signal processing units including the channel branching unit, the sound image control unit, and the adding unit are provided, and Q is 2 or more. And the terminal-corresponding branch unit branches the K channel added voice signals from the Q audio signal processing units respectively corresponding to N terminal units, and the terminal-corresponding branch unit unit From the Q groups of correspondingly divided K channel audio signals, one or a plurality of groups are selected under the control of the signal processing control unit, added for each channel, and output as one set of K channel audio signals. Of the input voice signal from the terminal to be joined according to the control based on the conference participation request signal of the signal processing control unit. Output to the selected audio channel corresponding to the audio signal processing unit.

11. The voice communication control device according to claim 10, wherein each of the conference selection units has one of the K channels from one or a plurality of the voice signal processing units designated by a participation request signal from a corresponding terminal. The added audio signal is added for each channel and output as the added audio signal of the K channel distributed to the corresponding terminal.

12. The voice communication control device according to claim 1, wherein a set of a corresponding one of the destination spatial positions of the sound sources different for each of the terminals is provided for the branch channel voice signal of the K channel corresponding to each of the terminals. The sound image control unit has a signal processing control unit that determines a sound image control parameter, and the sound image control unit uses the determined set of sound image control parameters to operate on the corresponding branch channel audio signal of the K channel, thereby each terminal. The K channel sound image control audio signal is generated for each time.

13. The voice communication control device according to claim 10, wherein the number of branches of the channel branching unit is 2, and each set of the sound image control parameters is a pair of acoustic transfer functions.

14. The voice communication control device according to claim 10, 11, 12 or 13, wherein the signal processing control unit detects a participation request signal from the terminal to participate in the communication conference to thereby participate in the communication conference. Number, determine the destination space position for each participating terminal based on the number of participating terminals, determine the sound image control parameter corresponding to each of the determined destination space position to the sound image control unit. give.

15. The voice communication control device according to claim 14, wherein the signal processing control unit detects the number N of the terminals connected by the switching unit, and symmetrically determines different destination space positions for each of the terminals. (N-1) degree interval is decided.

16. The voice communication control device according to claim 1, further comprising:
A subtraction unit that cancels the component of the K-channel sound image control audio signal corresponding to the terminal from the sound image control unit, from the K channel added voice signal distributed by the terminal corresponding to each terminal. including.

17. The voice communication control device according to claim 1, further comprising:
It includes a multiplexing unit that multiplexes the added voice signals of the K channels distributed by the terminal corresponding branch unit for each terminal into one channel and gives the multiplexed signal to the switching unit.

18. The voice communication control device according to claim 1, wherein the number of sets of the adder unit and the terminal corresponding branch unit is provided, and Q is an integer of 2 or more, and the voice communication control device further includes: A terminal combination assigning unit for giving the K-channel sound image control audio signal corresponding to each terminal from the sound image control unit to one or more designated addition units;
And an inter-combination adding unit that adds the K channel added voice signals respectively distributed from one or a plurality of the terminal corresponding branch units and outputs the added voice signals to the corresponding terminals.

19. The voice communication control device according to claim 1, wherein the channel branching unit, the sound image control unit, the adding unit, and the terminal-corresponding branching unit are provided as Q sets, and Q sets are provided.
Is an integer of 2 or more, and the voice communication control device further includes
A terminal combination allocating unit that supplies the input voice signal from each of the terminals connected by the switching unit to one or more designated channel branching units, and one or more designated terminal-corresponding branches. K distributed from each department
And an inter-combination adding unit for adding the added voice signals of the channels corresponding to the channels and outputting to the corresponding terminals.

20. The voice communication control device according to claim 18, wherein the K channel is one channel for each of the left and right channels, and the sound image control section is provided with different sound sources for branching voice signals of the left and right channels corresponding to each of the terminals. By performing a convolution operation of a pair of left and right acoustic transfer functions corresponding to the target space position as the sound image control parameter, a stereo audio signal for each left and right channel is generated as the sound image control audio signal for each terminal.

21. The voice communication control device according to claim 20, further detecting a participation request signal to the communication conference of each of the terminals to obtain the total number of terminals participating in the Q communication conferences.
Signal processing control for determining the number of the target space positions corresponding to the number of all the participating terminals, determining a set of acoustic transfer functions corresponding to the target space positions as the sound image control parameters, and giving the set to the sound image control unit. Including parts.

22. The voice communication control device according to claim 14, wherein the signal processing control unit determines the object based on a new number of participants every time the number of participants changes such as a participation request or cancellation of participation in the communication conference. The spatial position is updated, the transfer function is updated according to the updated target spatial position, and the updated transfer function is set in the sound image control unit.

23. In the voice communication control device according to claim 21, wherein the signal processing control unit represents the number of terminals participating in all communication conferences as M, the destination space position is symmetrical to 180 / (M-1) degree interval is decided.

24. The voice communication control device according to claim 19, wherein the plurality of channels are each one left and right channel, and the sound image control unit corresponding to each communication conference is a branch voice of the left and right channels corresponding to each of the terminals. By performing convolution calculation of a pair of left and right acoustic transfer functions corresponding to different target spatial positions of the sound source with respect to the signals as the sound image control parameters, the stereo image signals of the left and right channels are controlled for each terminal. Generate as an audio signal.

25. The voice communication control device according to claim 24, further calculates the number of participating terminals of each of the communication conferences based on a participation request signal from each of the terminals to each communication conference, It includes a signal processing control unit that determines each of the acoustic transfer functions of a pair corresponding to the number of target space positions corresponding to the number of participating terminals and sets the acoustic transfer functions in the sound image control unit corresponding to the communication conference.

26. The voice communication control device according to claim 25, wherein the signal processing control unit, when the number of participating terminals of each of the communication conferences is represented by M _P , makes the target space position 180 / symmetrically.
(M _P -1) degree interval.

27. The voice communication control device according to claim 25, wherein the signal processing control unit causes a change in the number of terminal participations in a communication conference such as a new participation request or cancellation of participation during any communication conference. Each time it occurs, the destination space position of the communication conference is updated based on the new number of terminal participation, and the acoustic transfer function is updated corresponding to the updated destination space position and set in the corresponding sound image control unit. To do.

28. The voice communication control device according to claim 1, further comprising:
A plurality of switches that are inserted in series in each of the input channels to pass or block the input voice signal, and whether or not the corresponding terminal is in the uttering state from the input voice signal on each of the input channels. The input voice signal is passed to the switch of the input channel determined to be in the utterance state, and the input voice signal is blocked to the switch of the input channel determined to be not in the utterance state. And a voice detector.