JPH0463084A

JPH0463084A - Television conference system among scattered sites

Info

Publication number: JPH0463084A
Application number: JP17274490A
Authority: JP
Inventors: Toshihiko Wakahara; 若原　俊彦; Masayuki Miyazawa; 宮澤　正幸
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1990-07-02
Filing date: 1990-07-02
Publication date: 1992-02-28

Abstract

PURPOSE:To give presence and to smoothly expedite a conference by flexibly synthesizing and displaying pictures at plural sites and generating an audio signal from plural speakers correspondingly to a display picture. CONSTITUTION:A video multiplex switching and synthesizing part 29 selects video signals from respective lines in time division based on the indication from a control part 27 and performs segmentation or reduction synthesis and sends the results to video line processing parts 12a to 12d and converts them to codes adapted to transmission lines. Meanwhile, audio signals are decoded by audio line processing parts 24a to 24d and have the audio levels detected, and addition processing of two channels corresponding to respective display pictures is performed by an audio selective addition and distribution part 26. Thereafter, they are encoded by parts 24a to 24d again and are multiplexed with video signals and are sent to lines 5a to 5d. This multiplexed signal is separated to video and audio signals, and they are decoded and are outputted to monitors and speakers.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、遠隔地に設置された複数のテレビ会議端末群
をＢ−ＩＳＤＮ（広帯域サービス総合デジタル網）など
の通信網を介して相互に接続し、これらの端末間で通信
会議を行なう多地点間テレビ会議に関するものである。[Detailed Description of the Invention] [Industrial Application Field] The present invention enables a plurality of video conference terminals installed in remote locations to communicate with each other via a communication network such as B-ISDN (Broadband Service Integrated Digital Network). The present invention relates to a multipoint video conference in which the terminals are connected and a communication conference is held between these terminals.

[Conventional technology]

従来の会議ブリフジなどの制御ノードを用いた多地点間
テレビ会議システムは第６図に示すように構成されてい
た。同図において、１ａ〜１ｄは複数地点にあるテレビ
会議端末、２は多地点間テレビ会議制御装置であり、通
信網３内に制御装置２を設置し、制御装置２とテレビ会
議端末との間に回線４ａ〜４ｄを設定し、端末１ａ〜１
ｄの間で相互の通信を行なっていた。ここで、テレビ会
議端末１は、カメラ、モニタなどの映像系、マイク、ス
ピーカなどの音声系および通信網３に接続するための網
制御装置などから構成されている。A conventional multipoint video conference system using a control node such as a conference brig was configured as shown in FIG. In the figure, 1a to 1d are video conference terminals located at multiple points, and 2 is a multipoint video conference control device. Set lines 4a to 4d to terminals 1a to 1.
Mutual communication was carried out between d. Here, the television conference terminal 1 includes a video system such as a camera and a monitor, an audio system such as a microphone and a speaker, a network control device for connecting to the communication network 3, and the like.

次に、′第６図のシステムの動作を説明する。まず、テ
レビ会議端末１ａから通信網３内の交換機に回線設定を
要求し、多地点間テレビ会議制御装置２とテレビ会議端
末１ａ〜１ｄとの間に４本の必要な伝送容量の回Ｂ４ａ
〜４ｄを設定する。これにより４台のテレビ会議端末１
ａ、ｌｂ、ｌｃおよびｌｄと多地点間テレビ会議制御装
置２との間でスター状に個別の回線が設定され、これら
の装置間の通信を行なう。ここで、回線４ａ〜４ｄにお
いては、映像信号、音声信号および表示画面を指定する
制御信号が多重化されて全２重双方向通信を行なってお
り、第６図のテレビ会議端末１ｂおよびＩＣのモニタに
示すように他地点のカメラからの映像の縮小合成画面を
表示したり、テレビ会議端末１ａおよび１ｄのモニタに
示すように特定の地点（例えば発言者）の画面を表示す
ると共に、音声については全ての対地の音声をミキシン
グ（加算）し、これを聞きながら多地点間テレビ会議を
行なう。これを実現するための多地点間テレビ会議制御
装置２について以下説明する。Next, the operation of the system shown in FIG. 6 will be explained. First, a request is made from the video conference terminal 1a to the exchange in the communication network 3 to set up a line, and four lines B4a with the required transmission capacity are connected between the multipoint video conference control device 2 and the video conference terminals 1a to 1d.
Set ~4d. This allows 4 video conference terminals 1
Individual lines are set up in a star shape between a, lb, lc, and ld and the multipoint video conference control device 2, and communication between these devices is performed. Here, in the lines 4a to 4d, the video signal, the audio signal, and the control signal specifying the display screen are multiplexed to perform full-duplex two-way communication, and the video conference terminal 1b and the IC shown in FIG. As shown on the monitor, a reduced composite screen of images from cameras at other points is displayed, and as shown on the monitors of the video conference terminals 1a and 1d, the screen of a specific point (for example, the speaker) is displayed, and the audio mixes (adds) the audio from all destinations and conducts a multipoint video conference while listening to this. The multipoint video conference control device 2 for realizing this will be described below.

第７図は、多地点間テレビ会議説明装置２の構成例を示
したものである。同図において、ｌｌａ〜ｌｌｄは回線
インタフェース、１２ａ−１２ｄは映像回線処理部、１
３ａ〜１３ｄは映像縮小部、１４２〜１４ｄは音声回線
処理部、１５は映像切替・合成部、１６は音声加算・分
配部、１７は制御部である。FIG. 7 shows an example of the configuration of the multipoint video conference explanation device 2. As shown in FIG. In the figure, lla to lld are line interfaces, 12a to 12d are video line processing units, and 1
Reference numerals 3a to 13d are video reduction units, 142 to 14d are audio line processing units, 15 is a video switching/synthesizing unit, 16 is an audio addition/distribution unit, and 17 is a control unit.

次に、第７図のシステムの動作を説明する。テレビ会議
端末１ａ〜１ｄからの信号は、回線インタフェースｌｌ
ａ〜ｌｉｄで映像信号、音声信号および制御信号に分離
され、映像信号は映像回線処理部１２ａ〜１２ｄへ、音
声信号は音声回線処理部１４ａ〜１４ｄへ、制御信号は
制御部１７に分離して伝達される。まず、各回線からの
映像信号は映像回線処理部１２ａ〜１２ｄで復号化され
るとともにビデオメモリにより映像同期（水平同期、垂
直同期および映像フレーム同期）がとれた後、映像縮小
部１３ａ〜１３ｄで空間フィルタで処理し、垂直および
水平方向の画素を１個おきに間引いて、映像切替・合成
部１５で各回線毎の同期をとって各映像信号の合成また
は切替えを行ない、再び映像回線処理部１２ａ〜１２ｄ
で符号化され、回線インタフェースｌｌａ〜ｌｉｄに送
出される。ここで、縮小画面か特定の地点の画面かの選
択はテレビ会議端末１ａ〜１ｄからの制御信号に基づい
て行なわれ、この制御信号は回線インタフェースｌｌａ
〜ｌｉｄで分離された後、制御部１７で処理され、これ
に基づいて映像切替・合成部１５に指示する。一方、音
声信号は回線インタフェースｌｌａ〜ｌｉｄで分離され
た後、音声回線処理部１４ａ〜１４ｄで復号化され、音
声加算・分配部１６で自地点の音声を除いて加算される
（Ｎ−１）加算の処理がなされ、さらにハウリングを防
止するため地点数に応じたロスが挿入され、それぞれの
音声回線処理部１４ａ〜１４ｄに送出される。ここでＮ
は地点数で、いまの場合４である。Next, the operation of the system shown in FIG. 7 will be explained. Signals from the video conference terminals 1a to 1d are transmitted through the line interface ll.
A to lid are separated into a video signal, an audio signal, and a control signal. communicated. First, the video signal from each line is decoded by the video line processing units 12a to 12d, and video synchronization (horizontal synchronization, vertical synchronization, and video frame synchronization) is established by the video memory, and then the video signal is processed by the video reduction units 13a to 13d. Processed by a spatial filter, thinning out every other pixel in the vertical and horizontal directions, the video switching/synthesizing unit 15 synchronizes each line, synthesizes or switches each video signal, and then returns to the video line processing unit. 12a-12d
and sent to the line interfaces lla-lid. Here, the selection between the reduced screen and the screen at a specific point is made based on control signals from the video conference terminals 1a to 1d, and this control signal is transmitted through the line interface lla.
After being separated by . On the other hand, the audio signals are separated by the line interfaces lla to lid, decoded by the audio line processing units 14a to 14d, and added by the audio addition/distribution unit 16 excluding the audio at the own point (N-1). Addition processing is performed, and a loss corresponding to the number of points is inserted to prevent howling, and the signals are sent to the respective audio line processing units 14a to 14d. Here N
is the number of points, which in this case is 4.

次に、第８図に、音声加算・分配部１６の構成を示す。Next, FIG. 8 shows the configuration of the audio addition/distribution section 16.

同図に示すように、−旦全ての地点の音声信号を加算し
た後、自地点の音声を除くとともに制御部１７からの地
点数情報に基づきロスが挿入され、各地点に送出する音
声信号を作成する。As shown in the figure, after first adding up the audio signals of all points, the audio of the own point is removed and a loss is inserted based on the number of points information from the control unit 17, and the audio signals to be sent to each point are create.

なお、第８図で地点数の変化に対応して、制御部１７か
らの指示により入力音声信号スイッチを開閉させる。こ
こで加算された音声信号は、再び音声回線処理部１４ａ
〜１４ｄで損失分が付加され、伝送路に適した符号に変
換されると共に、回線インタフェースｌｌａ〜ｌｉｄで
映像信号などと多重化され、テレビ会議端末１３〜１ｄ
に送出される。ここで、テレビ会議端末側で発言者が交
代する度に画面を切り替えるのは面倒であり、マンマシ
ンインタフェースを改善するため、音声信号のレベルを
検出して発言者の画面を自動的に表示する方法があり、
この場合は、音声回線処理部１４ａ〜１４ｄのレベル検
出結果から話者を識別し、これを制御部１７に伝え、こ
の信号を映像切替・合成部１５に伝えて画面を自動的に
切り替える。In addition, in response to the change in the number of points in FIG. 8, the input audio signal switch is opened and closed by instructions from the control section 17. The audio signal added here is again processed by the audio line processing unit 14a.
In ~14d, a loss is added and converted into a code suitable for the transmission path, and is multiplexed with video signals etc. in line interfaces lla~lid, and sent to video conference terminals 13~1d.
will be sent to. Here, it is troublesome to switch the screen every time the speaker changes on the video conference terminal side, so in order to improve the man-machine interface, the speaker's screen is automatically displayed by detecting the level of the audio signal. There is a way,
In this case, the speaker is identified from the level detection results of the audio line processing units 14a to 14d, this is transmitted to the control unit 17, and this signal is transmitted to the video switching/synthesizing unit 15 to automatically switch the screen.

[Problem to be solved by the invention]

このように多地点間テレビ会議制御装置２の音声信号の
処理方法としては、目地点の音声を除いた単純加算（Ｎ
−１加算）しかしていないので、テレビ会議端末１ａ〜
ｌｄ上で縮小合成画面あるいは切出し合成画面を見なが
ら会議を行なうと、どの地点の音声も同じスピーカから
聞こえてくることに加え、縮小画面の場合は各対地の画
面が小さいため、誰が発言しているかの識別が難しくな
り、臨場感に欠けるという欠点を有していた。As described above, the audio signal processing method of the multipoint video conference control device 2 is simple addition (N
-1 addition), so the video conference terminal 1a~
If you hold a meeting while looking at the reduced composite screen or cropped composite screen on LD, the audio from all points will be heard from the same speaker, and in the case of a reduced screen, the screens at each site are small, so it will be difficult to hear who is speaking. This has the drawback of making it difficult to identify the dolphins and lacking a sense of realism.

本発明はこのような点に鑑みてなされたものであり、そ
の目的とするところは、どのテレビ会議端末で発言して
いるかを容易に類推でき、臨場感を出させることのでき
る多地点間テレビ会議システムを得ることにある。The present invention has been made in view of these points, and its purpose is to provide a multi-point television that can easily infer which video conference terminal is being used to make a statement and that can create a sense of realism. The goal is to obtain a conference system.

[Means to solve the problem]

このような目的を達成するために本発明は、テレビ会議
端末から会議制御ノードに対して各地点の画面の切出し
位置情報および合成位置情報を制御信号として伝送する
伝送手段と、各地点の画面から切出し位置情報に基づい
て画面の切出しを行ない、合成位置情報に基づいて画面
合成を行なう切出し合成手段と、画面合成で各地点毎の
映像の合成する位置の中心が画面上でどの位置にあるか
を判定する判定手段と、この判定手段における判定結果
に基づいて複数の音声加算処理を行なう加算手段とを備
え、加算手段が、テレビ会議端末のモニタに表示する際
に制御信号に基づいて複数のスピーカで加算音声信号を
出力させるようにしたものである。In order to achieve such an object, the present invention provides a transmission means for transmitting cut-out position information and composite position information of the screen of each point from the video conference terminal to the conference control node as a control signal, and A cropping and compositing means that performs screen cropping based on cropping position information and screen compositing based on compositing position information, and a position on the screen at which the center of the position where images from each point are to be composited is located during screen compositing. and an addition means that performs a plurality of audio addition processes based on the determination result of the determination means, and the addition means performs a plurality of audio addition processes based on a control signal when displaying on a monitor of a video conference terminal. The added audio signal is output from a speaker.

[Effect]

本発明による多地点間テレビ会議システムでは、多地点
の映像を同時に表示するため、各地点の画面を縮小ある
いは一部を切り出し、これらを合成して一つの画面を作
成するとともに、この合成画面に対応して複数の音声信
号を選択的に加算し、これを複数回線用いて端末側に送
出する。端末側では、複数のスピーカを設け、画面合成
情報を受信するとともに画面に対応したスピーカに音声
信号を出力させる。In the multipoint video conferencing system according to the present invention, in order to simultaneously display images from multiple points, the screen at each point is reduced or partially cut out, and these are combined to create a single screen. Correspondingly, multiple audio signals are selectively added and sent to the terminal side using multiple lines. On the terminal side, a plurality of speakers are provided to receive screen synthesis information and output audio signals to the speakers corresponding to the screen.

〔Example〕

以下、本発明の一実施例について図面により説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図は、本発明による多地点間テレビ会議システムの
一実施例を示す構成図である。同図において、５ａ〜５
ｄは映像・音声などを伝送できる容量を有する伝送手段
としての回線、６ａ〜６ｄはテレビ会議端末、７は多地
点間テレビ会議制御装置である。また、多地点間テレビ
会議制御装置７において、１２２〜１２ｄは映像回線処
理部、１３３〜１３ｄは映像縮小部、２４ａ〜２４ｄは
音声回線処理部、２６は加算手段としての音声選択加算
・分配部、２７は判定手段としての制御部、２８ａ〜２
８ｄは回線インタフェース、２９は切出し合成手段とし
ての映像多重切替・合成部である。FIG. 1 is a block diagram showing an embodiment of a multipoint video conference system according to the present invention. In the same figure, 5a to 5
Reference numeral d represents a line as a transmission means having a capacity for transmitting video and audio, 6a to 6d represent video conference terminals, and 7 represents a multipoint video conference control device. In the multipoint video conference control device 7, 122 to 12d are video line processing units, 133 to 13d are video reduction units, 24a to 24d are audio line processing units, and 26 is an audio selection addition/distribution unit as an addition means. , 27 is a control unit as a determination means, 28a to 2
8d is a line interface, and 29 is a video multiplexing switching/synthesizing unit as a cutting/synthesizing means.

第１図のシステムの動作を次に説明する。同図において
、端末６ａに司会者がおり、端末６ｂで発言者が発言し
ているものとする。テレビ会議端末６ａ〜６ｄからそれ
ぞれ画面制御信号により以下の指定を行なう。すなわち
、端末６ａでは端末６ｂの発言者の地点Ｐｂと目地点Ｐ
ａと他の地点Ｐｃ、Ｐｄの合成画面、端末６ｂでは発言
者が発言しているので端末６ａの司会者の地点Ｐａと他
の地点Ｐｃ、Ｐｄの合成画面、端末６ｃでは端末６ｂの
発言者の自動切替画面、端末６ｄでは発言者の地点ｐｂ
と他の地点Ｐａ、ＰＣの合成画面を指定するとともに、
該当地点の画面を一部切り出す場合の位置情報および合
成する場合の位置情報も併せて指示する。The operation of the system shown in FIG. 1 will now be described. In the figure, it is assumed that there is a moderator at the terminal 6a and a speaker is speaking at the terminal 6b. The following designations are made using screen control signals from the video conference terminals 6a to 6d, respectively. That is, in the terminal 6a, the point Pb and the eye point P of the speaker of the terminal 6b are
A composite screen of point a and other points Pc and Pd, since the speaker is speaking on terminal 6b, a composite screen of point Pa of the moderator of terminal 6a and other points Pc and Pd, terminal 6c is a composite screen of the speaker of terminal 6b automatic switching screen, speaker's point pb on terminal 6d
and other points Pa, and specify the composite screen of the PC,
The location information for cutting out a portion of the screen at the relevant point and the location information for compositing are also specified.

一方、多地点間テレビ会議制御装置７では、テレビ会議
端末６ａ〜６ｄのカメラからの映像信号、マイクからの
音声信号および上記の画面制御信号は伝送路５ａ〜５ｄ
に適した符号に変換され、多重化伝送され、回線インタ
フェース２８ａ〜２８ｄで受信する。回線インタフェー
ス２８ａ〜２８ｄで映像信号、音声信号および制御信号
にそれぞれ分離され、映像回線処理部１２ａ−１２ｄ、
音声回線処理部２４ａ〜２４ｄおよび制御部２７に伝達
される。On the other hand, in the multipoint video conference control device 7, the video signals from the cameras of the video conference terminals 6a to 6d, the audio signals from the microphones, and the above screen control signals are transmitted to the transmission paths 5a to 5d.
The data is converted into a code suitable for , multiplexed and transmitted, and received by the line interfaces 28a to 28d. The line interfaces 28a to 28d separate the video signal, audio signal, and control signal, respectively, and the video line processing units 12a to 12d,
The information is transmitted to the audio line processing sections 24a to 24d and the control section 27.

まず映像信号は、映像回線処理部１２ａ〜１２ｄで復号
化され、ビデオメモリにより同期（水平、垂直および映
像フレームの各同期）がとられるとともに、映像縮小部
１３ａ〜１３ｄでフィルタをかけた後、垂直および水平
方向の画素を１個ずつ間引き縮小する。さらに映像多重
切替・合成部２９で制御部２７からの指示（後述の画面
位置情報）に基づいて、時分割的に各回線からの映像信
号を選択（後述のマトリクス・スイッチのオンオフ制御
による）することにより、切出し又は縮小合成を行ない
、映像回線処理部１２ａ〜１２ｄに送出して伝送路に適
した符号化を行なう。First, the video signal is decoded by the video line processing units 12a to 12d, synchronized by the video memory (horizontal, vertical, and video frame synchronization), and filtered by the video reduction units 13a to 13d. The pixels in the vertical and horizontal directions are thinned out one by one. Furthermore, the video multiplexing/synthesizing unit 29 selects video signals from each line in a time-sharing manner (by on/off control of the matrix switch described below) based on instructions from the control unit 27 (screen position information described later). By doing so, the signals are cut out or reduced and combined, and sent to the video line processing units 12a to 12d for encoding suitable for the transmission path.

一方、音声信号は音声回線処理部２４ａ〜２４ｄで復号
化された後、音声レベルの検出を行ない、音声選択加算
・分配部２６で制御部２７からの指示に基づきそれぞれ
の表示画面に対応した２チヤネル（左側および右側音声
）分の加算処理を行なう。例えば、回！５ａについては
、回線５ｂの音声すと回線５Ｃの音声Ｃに５ｄの音声ｄ
を加えた音声ｃ＋ｄとの処理を行なう。この後、再び音
声回線処理部２４ａ〜２４ｄで符号化し、回線インタフ
ェース２８ａ〜２８ｄで映像信号と多重化され、回線５
ａ〜５ｄに送出される。この多重化された信号をテレビ
会議端末６ａ〜６ｄで受信し、映像および音声信号に分
離された後、復号化され、それぞれモニタおよびスピー
カに出力される。On the other hand, after the audio signal is decoded by the audio line processing units 24a to 24d, the audio level is detected, and the audio selection addition/distribution unit 26 selects two signals corresponding to each display screen based on instructions from the control unit 27. Performs addition processing for channels (left and right audio). For example, times! For 5a, the voice of line 5b is the voice of line 5C, and the voice of 5d is voice d.
processing is performed with audio c+d. After that, the audio line processing units 24a to 24d encode the signal again, and the line interfaces 28a to 28d multiplex it with the video signal, and send it to the line 5.
Sent from a to 5d. This multiplexed signal is received by the video conference terminals 6a to 6d, separated into video and audio signals, decoded, and output to a monitor and a speaker, respectively.

第２図は、映像多重切替・合成部２９の構成図を示した
ものであり、映像回線処理部１２ａ〜１２ｄまたは映像
縮小部１３２〜１３ｄのそれぞれの回線の映像入力信号
を同期をとって制御部２７からの指示（スイッチＳＷの
右下の矢印で示す）に基づいてマトリクス・スイッチＳ
Ｗのオン・オフを行なう。３０ａ〜３０ｄはスーパイン
ポーズ回路であり、どの地点かを示す情報（例えばＰａ
Ｐｂ、Ｐｃ、Ｐｄなどの文字）をテレビ会議端末に重畳
するためのものである。これにより、合成画面等でどの
端末かを識別するのが容易となる。FIG. 2 shows a configuration diagram of the video multiplexing/synthesizing section 29, which synchronizes and controls the video input signals of the respective lines of the video line processing sections 12a to 12d or the video reduction sections 132 to 13d. The matrix switch S
Turns W on and off. 30a to 30d are superimpose circuits that contain information indicating which point (for example, Pa
This is for superimposing characters (such as Pb, Pc, Pd, etc.) on a video conference terminal. This makes it easy to identify which terminal it is on a composite screen or the like.

また、マトリクス・スイッチＳＷでは、水平方向のライ
ン毎の画素をタイミングをとって映像回線処理部１２ａ
〜１２ｄからの映像の′切出し、または映像縮小部１３
２〜１３ｄからの縮小映像信号の切出しおよび合成を行
なう。ここで、切り出すタイミングは端末からの画面制
御信号の位置情報にもとづいて行ない、合成する際もテ
レビ会議端末６ａ〜６ｄからの位置情報により行なう。Further, in the matrix switch SW, the pixels of each line in the horizontal direction are controlled by the video line processing unit 12a.
- Extracting the video from 12d or video reduction unit 13
The reduced video signals from 2 to 13d are cut out and synthesized. Here, the timing of cutting out is performed based on the position information of the screen control signal from the terminal, and the combining is also performed based on the position information from the video conference terminals 6a to 6d.

また、端末６ｃのように、発言者画面の自動切替を行な
う場合は、制御部２７で音声回線処理部２４ａ〜２４ｄ
の音声レベル検出結果から最大値を求めることにより発
言者を検出し、この情報を映像多重切替・合成部２９に
送出して常に発言者の画面となるようマトリクス・スイ
ッチＳＷを制御する。In addition, when automatically switching the speaker screen as in the case of the terminal 6c, the control unit 27 controls the audio line processing units 24a to 24d.
The speaker is detected by finding the maximum value from the voice level detection results, and this information is sent to the video multiplexing/synthesizing section 29 to control the matrix switch SW so that the screen of the speaker is always displayed.

第３図（ａ）〜（ｅ）はマトリクス・スイッチＳＷにお
ける映像信号の合成方法の説明図であり、回線５ａから
の映像に対してはテレビ会議端末からの画面制御信号に
もとづいて切出し位置情報（ＸａｌＹａｌ）および（Ｘ
ａ２．Ｙａ２）の対角関係で示される長方形の画面の切
出しを行ない（第３図ｆａ）参照）、回［５ｂからは映
像縮小部１３ｂの縮小信号から位置情報（Ｘｂｌ、Ｙｂ
１）および（Ｘｂ２．Ｙｂ２）の対角関係で示される正
方形の画面の切出しを行ない（第３図（ｂ）参照）、そ
れぞれ回線５ａからの画面は（Ｘｇｌ、Ｙｇｌ）および
（Ｘｇ２．Ｙｇ２）の対角位置上の長方形の画面に、回
線５ｂからの画面は（Ｘｇ　３．Ｙｇ　３）および（Ｘ
ｇ　４．　Ｙｇ　４）の正方形の画面左下位置に合成さ
れる。その他、回１１ｉ５ｃおよび５ｄからの画面につ
いてもそれぞれ（Ｘｇ５．Ｙｇ５）および（Ｘｇ６．Ｙ
ｇ６）の対角位置の画面、（Ｘｇ７．Ｙｇ７）および（
Ｘｇ８．Ｙｇ８）の対角位置の画面に対応する中央の下
および右下の位置に同様の合成処理が行なわれる（第３
図（Ｃ１，（ｄｌ参照）。ここで、縮小画面の切出しを
行なう場合は、その処理を簡単にするため、折り返し雑
音を取り除くため空間フィルタをかけ、水平および垂直
方向の画素をそれぞれ１／２ずつ（１個おき）に間引き
、正方形状に合成している。FIGS. 3(a) to 3(e) are explanatory diagrams of a method for synthesizing video signals in the matrix switch SW, and for the video from the line 5a, cutout position information is provided based on the screen control signal from the video conference terminal. (XalYal) and (X
a2. A rectangular screen shown in the diagonal relationship of Ya2) is cut out (see Fig. 3 fa)), and position information (Xbl, Yb
1) The square screen shown by the diagonal relationship of (Xb2.Yb2) is cut out (see Fig. 3(b)), and the screen from the line 5a is (Xgl, Ygl) and (Xg2.Yg2), respectively. The screen from line 5b is (Xg 3.Yg 3) and (X
g4. It is synthesized at the bottom left position of the square screen of Yg 4). In addition, the screens from times 11i5c and 5d are also (Xg5.Yg5) and (Xg6.Yg5), respectively.
g6), (Xg7.Yg7) and (
Xg8. Similar compositing processing is performed at the lower center and lower right positions corresponding to the diagonal positions of the screen of Yg8) (3rd
Figure (C1, (see dl)).When cutting out the reduced screen, in order to simplify the process, a spatial filter is applied to remove aliasing noise, and pixels in the horizontal and vertical directions are each reduced by 1/2. They are thinned out one by one (every other one) and combined into a square shape.

第４図は映像多重切替・合成部２９で合成する際のタイ
ミングを示したものであり、−点鎖線で示したライン（
第４図（ａ）参照）の映像情報を合成する場合は、１ラ
インのライン同期信号（第４図（Ｃ））から、それぞれ
Ｘｇ３とＸｇ４の間に回線５ｂの映像情報（第４図（ｄ
ｌ）、Ｘｇ５とＸｇ６の間に回線５ｃの映像情報（第４
図ｆｅｄ）、Ｘｇ７とＸｇ８の間に回線５ｄの映像情報
（第４図（ｆ））だけ取り出すようマトリクス・スイッ
チＳＷのタイミングのオン・オフ制御を行なう。他のラ
インについても同様のマトリクス・スイッチのタイミン
グ制御を行ない、これをフィールドおよびフレーム毎に
処理する（第４図（ｂｌ参照）。但し、合成を行なわな
いで話者の画面に自動切替を行なう場合は、音声回線処
理部２４ａ〜２４ｄの音声検出回路の情報を比較して制
御部２７で話者を識別し、この情報を映像多重切替・合
成部２９に指示し、マトリクス・スイッチＳＷ（第２図
参照）を常に話者の地点の画面に切り替えるよう制御す
る。FIG. 4 shows the timing of compositing in the video multiplexing switching and compositing section 29, and the line indicated by the dashed-dotted line (
When combining the video information of line 5b (see FIG. 4(a)), from the line synchronization signal of one line (see FIG. 4(C)), the video information of line 5b (see FIG. d
l), video information of line 5c (4th line) between Xg5 and Xg6
On/off control of the timing of the matrix switch SW is performed so that only the video information of the line 5d (FIG. 4(f)) is taken out between Xg7 and Xg8. Similar timing control of matrix switches is performed for other lines, and this is processed for each field and frame (see Figure 4 (bl)).However, automatic switching to the speaker's screen is performed without compositing. In this case, the control unit 27 identifies the speaker by comparing the information of the audio detection circuits of the audio line processing units 24a to 24d, instructs this information to the video multiplexing/synthesizing unit 29, and (see Figure 2) is controlled to always switch to the screen of the speaker's location.

次に、第１図を用いて本実施例における音声処理を説明
する。まず、回線インタフェース２８ａ〜２８ｄで分離
された音声信号は、音声回線処理部２４ａ〜２４ｄで復
号化され、各回線の音声レベルを検出すると同時に、音
声選択加算・分配部２６でそれぞれの表示する画面に対
応して音声の加算を行なう。例えば、第１図の例に示し
たように、端末６ａの場合には、左側のチャネルには音
声すを右側のチャネルには音声Ｃとｄの音声を加算して
ｃ＋ｄの音声となるよう選択的に加算する。Next, audio processing in this embodiment will be explained using FIG. First, the audio signals separated by the line interfaces 28a to 28d are decoded by the audio line processing units 24a to 24d, and the audio level of each line is detected.At the same time, the audio signal is separated by the audio selection addition/distribution unit 26 to be displayed on each screen. Addition of audio is performed in response to. For example, as shown in the example in Fig. 1, in the case of the terminal 6a, the left channel is selected to have audio S, and the right channel is selected to add audio C and audio d, resulting in audio c+d. Add to the table.

また、端末６ｂに対しては、左側のチャネルには音声ａ
とＣとｄを加算し、右側のチャネルには音声ａとｄを加
算する。同様にして端末６ｃと６ｄにも表示する画面の
位置情報を基に２チヤネルの音声加算を行なう。Also, for terminal 6b, the left channel has audio a.
, C, and d are added, and voices a and d are added to the right channel. Similarly, two channels of audio are added based on the screen position information displayed on the terminals 6c and 6d.

第５図に、この音声選択加算・分配部の構成を示す。同
図に示すように、−旦すべての地点の音声情報を加算し
た後、目地点の音声を減算して（Ｎ−１）加算した後、
制御部２７がらの指示に基づいて地点数に対応したロス
挿入制御および２チヤネル分の音声加算を自動的に行な
う。これは実線部および破線部の回路により実現してお
り、前者の回路はハウリング防止のため２０１ｏｇ（Ｍ
−１）　ｄ　Ｂのロスを挿入する。Ｍは加算する地点数
である。FIG. 5 shows the configuration of this audio selection addition/distribution section. As shown in the figure, after adding the voice information of all points -1, subtracting the voice of the eye point and adding (N-1),
Based on instructions from the control unit 27, loss insertion control corresponding to the number of points and audio addition for two channels are automatically performed. This is realized by the circuits shown in the solid line and the broken line, and the former circuit is 201og (M
-1) Insert the loss of dB. M is the number of points to be added.

また、後者の回路はスイッチ回路と減算回路がら構成さ
れ、制御部２７からの指示によりアダプティブにその接
続関係を変化させる。すなわち、制御部２７では、合成
画面の重心位置が画面の左側にあるのか、右側にあるの
かで２チヤネルの内の左または右と判断する。このよう
に、テレビ会議端末６ａ〜６ｄからの画面制御信号に基
づき、表示する画面に対応して２チヤネル分の加算を行
ない、それぞれ音声回線処理部２４ａ〜２４ｄで符号化
され、回線インタフェース２８ａ〜２８ｄで映像信号と
多重化され、回線５ａ〜５ｄを介して各テレビ会議端末
に送信される。端末側では、この２チヤネルの音声を２
つのスピーカで再現することにより、画面に対応して音
声を聞くことが可能となる。すなわち、例えば端末６ａ
では、音声すを左側のスピーカから、音声Ｃとｄを右側
のスピーカから出力する。また、テレビ会議端末６ｃの
ように自動切替え表示を行なう場合は、音声回線処理部
２４ａ〜２４ｄの音声検出回路で発言者を検出し、発言
者の画面である端末６ｂの画面を自動切替えで表示す・
ると共に各対地の音声信号を（Ｎ−１）加算し、両側の
スピーカからこの音声信号を出力する。The latter circuit is composed of a switch circuit and a subtraction circuit, and its connection relationship is adaptively changed according to instructions from the control section 27. That is, the control unit 27 determines whether the center of gravity of the composite screen is on the left or right side of the screen as the left or right channel of the two channels. In this way, based on the screen control signals from the video conference terminals 6a to 6d, two channels are added corresponding to the screen to be displayed, encoded by the audio line processing units 24a to 24d, and sent to the line interfaces 28a to 28d. It is multiplexed with a video signal at 28d and transmitted to each video conference terminal via lines 5a to 5d. On the terminal side, the audio of these two channels is
By reproducing the sound with two speakers, it becomes possible to hear the sound corresponding to the screen. That is, for example, the terminal 6a
Now, audio S is output from the left speaker, and audio C and d are output from the right speaker. In addition, when performing automatic switching display like the video conference terminal 6c, the voice detection circuit of the audio line processing units 24a to 24d detects the speaker, and the screen of the terminal 6b, which is the speaker's screen, is automatically switched and displayed. vinegar·
At the same time, (N-1) audio signals from each ground are added, and this audio signal is output from the speakers on both sides.

〔Effect of the invention〕

以上説明したように本発明によれば、複数の地点の画面
をフレキシブルに合成して表示させると共に、音声信号
を表示画面に対応して複数（例えば左右）のスピーカか
ら発生させることにより、臨場感を出すことができ、会
議の進行も円滑に進められるという利点がある。As explained above, according to the present invention, screens at multiple points are flexibly combined and displayed, and audio signals are generated from multiple (for example, left and right) speakers corresponding to the display screen, thereby creating a sense of realism. This has the advantage of allowing the meeting to proceed smoothly.

[Brief explanation of the drawing]

第１図は本発明による多地点間テレビ会議システムの一
実施例を示す構成図、第２図は第１図のシステムを構成
する映像多重切替・合成部の構成図、第３図は第１図の
システムにおける映像信号の合成方法の説明図、第４図
は第１図のシステムにおける映像信号の合成時のタイミ
ングを示すタイムチャート、第５図は第１図のシステム
を構成する音声選択加算・分配部の構成図、第６図は従
来の多地点間テレビ会議システムを示す構成図、第７図
は第６図のシステムを構成する多地点間テレビ会議制御
装置を示す構成図、第８図は従来の音声加算・分配部を
示す構成図である。３・・・通信網、５ａ〜５ｄ・・・回線、６ａ〜６ｄ・
・・テレビ会議端末、７・・・多地点間テレビ会議制御
装置、１２ａ〜１２ｄ・・・映像回線処理部、１３ａ〜
１３ｄ・・・映像縮小部、２４ａ〜２４ｄ・・・音声回
線処理部、２６・・・音声選択加算・分配部、２７・・
・制御部、２８ａ〜２８ｄ・・・回線インタフェース、
２９・・・映像多重切替・合成部。FIG. 1 is a block diagram showing an embodiment of a multipoint video conference system according to the present invention, FIG. An explanatory diagram of the video signal synthesis method in the system shown in the figure, Fig. 4 is a time chart showing the timing of the synthesis of video signals in the system shown in Fig. 1, and Fig. 5 is an audio selection addition that constitutes the system shown in Fig. 1.・A configuration diagram of a distribution unit; FIG. 6 is a configuration diagram showing a conventional multipoint video conference system; FIG. 7 is a configuration diagram showing a multipoint video conference control device that constitutes the system in FIG. 6; The figure is a block diagram showing a conventional audio addition/distribution section. 3... Communication network, 5a-5d... Line, 6a-6d.
...TV conference terminal, 7...Multipoint video conference control device, 12a-12d...Video line processing unit, 13a-
13d...Video reduction unit, 24a-24d...Audio line processing unit, 26...Audio selection addition/distribution unit, 27...
- Control unit, 28a to 28d... line interface,
29...Video multiplex switching/synthesizing section.

Claims

[Claims]

A centralized conference control node can display a composite screen of each point's screen reduced on the monitor screen of the video conference terminal, or can be set up at three or more points by switching the screen of each point and displaying it. In a multipoint video conference system in which the video conference terminals connected to each other are connected to each other to conduct a video conference between multiple points, the video conference terminal transmits to the conference control node cut-out position information and synthesis of the screen of each point. a transmission means for transmitting position information as a control signal; a cut-out and synthesis means for cutting out a screen from a screen at each point based on the cut-out position information; and a cut-out and synthesis means for carrying out screen synthesis based on the synthesis position information; comprising a determining means for determining at which position on the screen the center of the position where the images of each point are to be synthesized is located, and an adding means for performing a plurality of audio addition processes based on the determination result of the determining means; A multipoint video conference system, characterized in that the adding means causes a plurality of speakers to output an added audio signal based on the control signal when displayed on a monitor of the video conference terminal.